links for 2011-03-26

Trinity – Microsoft Research

Trinity is a graph database and computation platform over distributed memory cloud. As a database, it provides features such as highly concurrent query processing, transaction, consistency control. As a computation platform, it provides synchronous and asynchronous batch-mode computations on large scale graphs. Trinity can be deployed on one machine or hundreds of machines.

Graph is an abstract data structure that has high expressive power. Many real-life applications can be modeled by graphs, including biological networks, semantic web and social networks. Thus, a graph engine is important to many applications. Currently, there are several players in this field, including Neo4j, HyperGraphDB, InfiniteGraph, etc. Neo4j is a disk-based transactional graph database. HyperGraphDB is based on key/value pair store Berkeley DB. InfiniteGraph is a distributed system for large graph data analysis.

(tags: software graph database research microsoft trinity)
Data Science Toolkit

Welcome to the Data Science Toolkit

(tags: software opensource textmining geo datasciencetoolkit)
snappy – A fast compressor/decompressor – Google Project Hosting

Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as “Zippy” in some presentations and the likes.)

(tags: software opensource compression snappy)