links for 2009-08-10


  • This package determines important terms within a given piece of content. It uses linguistic tools such as Parts-Of-Speech (POS) and some simple statistical analysis to determine the terms and their strength.
  • jCarousel is a jQuery plugin for controlling a list of items in horizontal or vertical order. The items, which can be static HTML content or loaded with (or without) AJAX, can be scrolled back and forth (with or without animation).
  • OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.

    The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods.

    OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.

  • The OpenFST library implements algorithms on weighted finite state transducers. The PyOpenFST project contains bindings for the library.

    The focus right now is on exposing the most important functionality and algorithms in a simple way. Later versions of the bindings will expose additional functionality, like different semi-rings, n-best paths, etc.

    We'll try to keep the openfst package simple and as it is and then add additional packages for other kinds of transducers and additional functionality.

    The pyfst-* scripts are a haphazard collection, not necessarily useful for any particular purpose. They'll keep changing over time. But you may find them useful to figure out the library. Unit tests are in test-openfst.py