links for 2011-03-28

  • HTSQL is a URI-based high-level query language for relational databases. HTSQL wraps your database with a web service layer, translating HTTP requests into SQL and returning results as HTML, JSON, etc.

    HTSQL is designed for someone who is not a SQL expert, but needs a usable, comprehensive query tool for data access and reporting.

  • jStat is a statistical library written in JavaScript that allows you to perform advanced statistical operations without the need of a dedicated statistical language (i.e. MATLAB or R).
  • Media Converter. Simple but advanced converting for Mac OS X

    Convert almost every input file.

    Since Media Converter uses ffmpeg a lot of file formats are supported. Convert avi, wmv, mkv, rm, mov and more to other formats.

    Some files aren't supported by ffmpeg, but can be decoded with QuickTime®. Media Converter uses movtowav and movtoy4m to decode them.

    Convert to a lot of formats.

    Media Converter comes with presets to convert to popular video and audio formats. These presets can be fine-tuned in the Preferences.

    You also can create your own presets in the Preferences. Send them to us to share them with the world. A bit too Advance for you? No problem, this site contains presets.

    Smart converting thanks to its father.

    Media Converter shares a lot of internals with Burn (disc burning application). This way both can be improved based on user experiences.

  • The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.

    The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.

    Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.

    Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0.

    The algorithms used by the library are based on (and extending) some concepts of the paper "Boilerplate Detection using Shallow Text Features" by Christian Kohlschütter et al., presented at WSDM 2010 — The Third ACM International Conference on Web Search and Data Mining New York City, NY USA.

  • Ruby port of Protovis library According to the protovis site

    Protovis composes custom views of data with simple marks such as bars and dots. Unlike low-level graphics libraries that quickly become tedious for visualization, Protovis defines marks through dynamic properties that encode data, allowing inheritance, scales and layouts to simplify construction.

  • A collection of the best open data sets and open-source tools for data science

links for 2011-03-26

  • Trinity is a graph database and computation platform over distributed memory cloud. As a database, it provides features such as highly concurrent query processing, transaction, consistency control. As a computation platform, it provides synchronous and asynchronous batch-mode computations on large scale graphs. Trinity can be deployed on one machine or hundreds of machines.

    Graph is an abstract data structure that has high expressive power. Many real-life applications can be modeled by graphs, including biological networks, semantic web and social networks. Thus, a graph engine is important to many applications. Currently, there are several players in this field, including Neo4j, HyperGraphDB, InfiniteGraph, etc. Neo4j is a disk-based transactional graph database. HyperGraphDB is based on key/value pair store Berkeley DB. InfiniteGraph is a distributed system for large graph data analysis.

  • Welcome to the Data Science Toolkit
  • Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

    Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as “Zippy” in some presentations and the likes.)

links for 2011-03-21