links for 2009-04-07


  • Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). This sample application uses Amazon Elastic MapReduce to run a Multiple Step JobFlow that calculates pairwise similarity in a large database of items. In this example, we’ll apply the sample code to music and film recommendations, but the example could potentially be run on other datasets such as document term counts, product sales, or website logs. This article assumes some familiarity with MapReduce and Hadoop Streaming. Python and Hadoop Streaming were used to make the algorithm code as clear as possible, better performance can be obtained by porting the example code to Java.
  • Protovis is a visualization toolkit for JavaScript using the canvas element. It takes a graphical approach to data visualization, composing custom views of data with simple graphical primitives like bars and dots. These primitives are called marks, and each mark encodes data visually through dynamic properties such as color and position. For example, this simple bar chart visually encodes an array of numbers with height:
  • drian Holovaty, bad-boy YouTube guitar star (search for him, if you dare!) and co-author of the Django web framework, takes you under the hood of EveryBlock.com, a Knight Foundation News Challenge startup which rounds up local news and information, and is powered 100% by Python and Django.