-
mrjob is a Python package that helps you write and run Hadoop Streaming jobs.
mrjob fully supports Amazon's Elastic MapReduce (EMR) service, which allows you to buy time on a Hadoop cluster on an hourly basis. It also works with your own Hadoop cluster.
-
PyF is a python open source framework and platform dedicated to large data processing, mining, transforming, reporting and more.
Thanks to its use of best of breed lazy data flow programming techniques, only one item is processed at a time through a complete network including splitting, merging of the flows : no huge data sets in memory, pyf processes are scalable. Heck, you can even send a branch to another computer and continue it there !
Oh, also… there are several existing output plugins like csv, pdf, xml, xlsx or fixed length flat files! Just drag and drop one or more and you are done with your output reporting.
-
A project to compile Yahoo! Pipes into Python