(Centralized) URL shorteners considered harmful

In late April i started writing this post and it was lying around unfinished all these months. The impetus for the post were  a number of discussions i had whether URL shorteners are a good thing or not. (My position was along the lines of this BoingBoing post by Cory Doctorow)

Given the (recent) developments with Digg, cli.gs and tr.im i finally decided to finish the post(s): Here is the short version (Update: I started to elaborate on the points below in a follow up post ):

  1. Centralized URL Shorteners are considered harmful
  2. Serious content providers simply cannot rely on them due to quality of service reasons
  3. The need for shorturls within Twitter is arbitrarily imposed by Twitter, and can and should easily be changed by Twitter
  4. There are cases where content provider have a real need for short URLs. But short in that cases means around 30 chars in the worst case (newspaper columns) not 15.
  5. Most of the content providers already can provide this kind of short urls using a second-level domain of 10 – 15 chars and the ID of the content within their CMS (or a Base62 encoded random ID)
  6. rev=”canonical” and/or rel=”alternate shorturl” are ways to announce that short URL to others that might be interested in your content. Use it and ask for support of this in your CMS (or build it)
  7. Realtime statistics are another (if not the) reason why centralized URL shorteners are used by many people. This kind of statistics should already be available in your CMS. If not you’re not really a serious content provider.
  8. If you really need real-time statistics for other peoples content, either use a redirecting URL on your own domain (as has been done for 10+years) or
  9. Run your own little URL shortener that is fully under your control , only writable to your staff and  integrated / integratable into your CMS.

Thinking about how to solve the real-time statistics dilemma the real-time the following bold / crazy idea was born:

What about the content providers opening  up  their access statistics themselves instead of having bit.ly etc. doing it for them.

This would make sure that all people can see the complete picture and not only some potentially twisted fragment provided by an URL Shortener service. Most content providers  already have a most emailed, best rated etc. section on their pages. It would also provide a much needed transparency in a link  economy as well as provide a decentralized solution for an apparent need of the people using the web. They want to know about how they contribute to the success of

Opening up the statistics like bit.ly and tr.im are doing it is IMHO the right way:

  • Everybody sees the summarized results for the page (The decision of the level of detail provided is up to the content provider)
  • Authenticated users can see the traffic that came from their registered  domains
  • Access via the web site and an API

Some first implementation ideas:

  • Authentication via OpenID /OAuth
  • Domain ownership verification like Google is doing for Google Apps etc.
  • Encapsulatin of the whole thing into plugins for blogs software and CMS

What do you think?

links for 2009-08-11

  • Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.

    At the present time, Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist (e.g., the Hadoop subproject). Pig's language layer currently consists of a textual language called Pig Latin, which has the following key properties:

  • Skulpt is an entirely in-browser implementation of Python.