Category Archives: Misfits

Posts that didn’t fit into any other category

More books

I’ve been quite busy in the last weeks, first on the job and then being on a one week skiing holiday. Hence i wasn’t able to report on a couple of new books i bought recently. So lets do this. In order to be chronical first the ‘oldest book’

The Definitive Guide to django – Web development Done Right

The definite Guide to django - Web development done right (Adrian Holovaty, Jacob Kaplan-Moss)I nearly finished reading “The Definitive Guide to django – Web development Done Right” by Adrian Holovaty and Jacob- Kaplan Moss. As you already might have guessed i’m a great fan of Adrian’s efforts on “data-based” / “data-backed” journalism. What you might not be aware of (unless you are following my DailyDeli posts) is that i’m very interested in all things Python. Being originally an AI researcher doing most of his work in either Smalltak or CommonLisp/CLOS, i guess Python is something like a natural fit (at least i learned from quite some other with the same background that they had the same motivation).At least when you started looking for a “mainstream” alternative for these two computer languages more than a two of years ago and did not get caught by the “RoR” hype.

At work i personally started using Python in conjunction with Zope for some serious web development (like the web frontend for the music download store used among others by AOL Germany and a number of mobile news websites) around six years ago. Being the only serious python-based web framework, Zope back then was a natural choice. But it brought as an added bonus that zope.com was working with media companies.

I started looking for alternatives to Zope 2 around 1 1/2 years ago (and no Zope 3 didn’t look like an alternative back then and does not look like one today). At Europython2006 i had the chance to get an overview about the python web framework state of the art. For me personally django stood out from the crowd due to the fact being based on the needs of a newspaper website and it’s pragmatic but nevertheless principled and clean design.

Since i hadn’t found time to write some code in order get a “real feeling” i was eager to get the book and read through it while commuting. All in all i liked reading it. There were only a few places where i found some minor errors, and generally the questions that formed in my head were answered only a few paragraphs later. So my positive view on django has been confirmed and i’m ready to get my hands dirty.

Hence i tried to set some spare time aside and go through the book (a least selected chapters) a second time, this time with my MBP sitting right next to it. And to do some coding in django, and learn about GeoDjango, how to use S3 and investigate on other non RDBMS data backends etc.

Website | Buy at Amazon (US)

Other books

I also bought some more books on Javascript and jQuery in particular (I intend to focus on jQuery as my main JavaScript library), a book on open street map (in german) and a more research oriented book on TextMining.

Pro JavaScript Techniques (John Resig)jQuery in Action (Bear Bibault, Yehuda Katz) OpenStreetMap - Die freie Weltkarte nutzen und mitgestalten (Frederik Ramm, Jochen Topf)The Text Mining Handbook (Ronen Feldman, James Sanger)

BTW: Relations is intended to be ads free: Forever. This is the reason why i still holding back on embedding deep links to the books in the various amazons stores (International and German). But i’m on the verge of considering these links rather a service than an ad. What do you think


Back from the break – What to expect next

After nearly 3 weeks of holidays i’m back from the break. In the meantime i was mostly renovating my house and only occasionally reading through my subscriptions as well as surfing the net. There was no time for writing any articles, especially because i had to install and set-up my new media centre/home server mac mini :-)
Hopefully some of you found my DailyDeli postings interesting. This may be also a good time to explain my “publishing philosophy”:

  • In the main categories ( Noteworthy, Recommended Reading, Review, Quick n’ Dirty, Misfits) of this blog you will only find original posts of quite some length. I often try be shorter by fail miserably in that account.
    • Normally posts go into the Noteworthy category (because they tend to be to long for a Quick n’ Dirty post
    • Recommended reading and Review are the two categories i use if i write about some book (or report) i’m reading or some hard- software i’m reviewing
    • Misfits is the place for the rest
  • The DailyDeli category is the place where you can find (semi-)automatic posts of my new del.ici.ous bookmarks (often with short comments why i find these particular pages interesting and how i found them). A headline changing from “links for …” to “DailyDeli for …” signals that i edited the post.
  • I don’t bookmark blog entries on del.icio.us. Instead i use Google reader to share them. The last 5 shared items are shown at the top of the blog home page.
  • You can also read all my subscriptions via the MyGrazr page

The reason for being a low frequency blog has nothing to do with no themes to write about. In fact the opposite is the case. A whole range of themes is in my head (and to some extend already on virtual paper). So expect to read:

  • More on ACAP newpapers and syndication issue,
  • Why i think that Atom and especially the Atom Publishing Protocol will be important
  • Some reviews of feedreaders and especially why Google Reader is important and easily can become Googles digg or Google News successor

Hopefully i will at least have the time to write short articles on all this until the christmas break. In the long run beside the above themes (and others) there will be especially three long running themes:

  • How could an alternative technical architecture for a news agencies look like if one starts with the question: “Is there a difference between a news agency and a blog network?”
  • Some ideas and experiments on news visualization and especially GeoNews.
  • What will be the impact of the OpenAPI, ClosedSource model used in Web2.0 on the open source movement?

So if you are interested in one or the other theme and/or would like to share your opinion or want to contact me in private for some other reason you can reach me now at relations (at) ka2.de

Orion, an algorithm for a revolutionary search engine?

Orionscreenshot-2 In the last days the news were abuzz with headlines like: Search for secret millions + Google , Google kauft Suchalgorithmus von israelischem Studenten (Google buys search algorithm from israeli student)

Most often they more or less recited the original press release (dated from Sept. 2005) and stated the fact that the inventor, Ori Allon, now works for google and the rumours that microsoft and yahoo also were interested

Especially the media focused on the following two passages of the press release:

“The results to the query are displayed immediately in the form of expanded text extracts, giving you the relevant information without having to go to the website – although you still have that option if you wish,”

and

“By displaying results to other associated key words directly related to your search topic, you gain additional pertinent information that you might not have originally conceived, thus offering an expert search without having an expert’s knowledge.”

So let’s have a look at these two claims.

Inline display of text extracts

This first claim clearly gets the media going, screaming IPR violation, IPR violation all over the place. Especially when it is enhanced by quotes from Ori Allon like:

I don’t envision that Orion will completely eliminate the need for going to actual web pages. (Sidney Morning Herald Interview)

Just in case nobody has noted: Google is already displaying text extracts as part of the search result. And IMHO there is good reason they are not displaying longer passages of the search result page, namely IPR issues.

Looking at some self-proclaimed Orion look-alikes like Qtsaver one can easily see that something like this can easily done via some frontend mashup using the Google API

So if that claim was the reason that Google bought the algorithm (and hence the patent) than only for one reason: To save them from legal hassles, definitely not for the technical merit of that invention.

Displaying results to other associated key words directly related to your search topic

The second claim could be the one where it gets interesting. Funny enough, this is the one that didn’t get quite that bit of media attention. What is claimed normally falls into the research problems labeled query expansion, thesaurus generation, concept learning etc. typically
If Mr. Allon has found a well working algorithm for one of the above problems, that is scalable and performant, and this means google-like scale and performance, this algorithm definitely should draw the interest of Google and the other search giants.

Query expansion etc. are normally fields coverd by the research discipline of artificial intelligence. Since Mr. Allon is, even after Google hired him, still a Ph.D. student of Eric Martin who is working in that field.

Mr. Martins homepage cites the following research interests:

My main interests are in the logical foundations of Artificial intelligence. The theoretical part of my research is mainly devoted to developing a unified framework, Parametric logic, that investigates the relationships between:

  • a notion of logical complexity, that accounts for various kinds of logical inferences, encompassing deductive, inductive and nonmonotonic inferences;
  • a notion of complexity from the perspective of Formal learning theory, encompassing learnability in the limit, with or without (ordinal) mind change bounds;
  • a notion of syntactic complexity, for formulas in infinitary modal languages;
  • a notion of topological complexity.

I am also involved in projects on knowledge acquisition based on ripple down rules, as well as projects on query answering systems, logic programming, and discovery from the web.

Since i worked also in the field of AI and logics (description logics, not parametric logics) i would love to learn more about this algorithm.