(Centralized) URL Shorteners considered harmful (Part II)

This is a follow up post on this post, detailing  bullet points 1 – 4 .

Reasons for not having centralized URL shorteners

Back on May 1st  using short urls on twitter were really taking off (at least in my perception). Tinyurl was the default shortener used by twitter. bit.ly only had only gotten my attention because of their use of OpenCalais (which causes some questions of it’s own, definitely worth a separate post.). I wasn’t aware of tr.im and the other url shorteners that were available at that time.

It was a time

  • before Digg decided to first fuzz around with the shorturls generated by the DiggBar (Daring Fireball)
  • and then do a redirect to a page on it’s own domain in case a user is not locked in (DownloadSquad, Mashable, …).
  • before cli.gs got hacked and 2.2million links redirected to an out of context page, losing about 1 and a half month of links while restoring because of missing daily backups.
  • It was before  tr.im decided to pull the plug on it’s shortening service yesterday and then resurrect it today.

In the discussions with my colleagues and friends i cited all these scenarios as reasons why centralized URL shorteners should be considered harmful. In addition there are at least the following reasons:

  • Centralized URLShorteners are SPOFs (Single Points Of Failure)
  • They add unnecessary complexity to the web (unnecessary redirects and hops)
  • Out of commercial necessity a provider might decide to redirect to an interstitial ad instead of original ShortURL target (this was actually proposed by some of the potential buying parties interested in tr.im)
  • In the worst case a provider might decide to redirect to content that is out of context or just outright illegal (think ultimately child porn)
  • Shorturls for content providers

    Due to this multitude of reasons every serious content provider IMHO has to do two things:

  • First it has to evaluate if shorturls are really needed in his case
  • If he comes to the conclusion that this is the case, the content provider MUST use a shorturl service that is completely under his control and ensures that the shorturls live at least  as long as the original content they link to.
  • Lets have a closer look :

    Shorturls and artificial restrictions: Twitter

    Most of the shorturls are out there for a single reason: Twitter

    The first thing to note is that the need for shorturls triggered by using twitter is an artificial need that is imposed by twitter. The only thing twitter has to do is to extend the data model for links and allow a single or maybe two separate field(s) allowing for an URL of arbitrary length. Since most of the twitter traffic is generated  by using clients via the API and these clients already include functionality for generating short links, map links, twitpic links etc., all theses clients have to do is to change the API call to twitter.

    Interestingly  Dave Winer comes to the same conclusion. I’ve very rarely seen a tweet that contains more than one shorturl (ok, i’ve done so myself :-().

    Shorturls because of real restrictions: SMS

    One of the original reason(s) for the 140char limit of Twitter was the integration of Twitter with SMS. But this is IMHO not  longer the case (and never has been), since arbitrary characters outside the 7-bit SMS charset are allowed within the tweet. If these characters are used,  the  amount of chars within a single SMS is reduced to 70 (if non SMS charset chars are used the text is encoded in UCS-2 which is a 16-bit charset).

    Moreover, nobody (at least in germany) is using the SMS gateway, and if only SMS chars are used the limit of chars within a SMS is 160 (= 140byte / 7 bit) which gives Twitter enough room to run it’s own shortening service only in case the SMS gateway is used.

    Shorturls because of real restrictions: EMail, Newspaper columns, IPTC7901/ANPA1312

    Besides Twitter, there are other scenarios where the length of the URL really matters for content  providers. AFAIK they are all related to line length. One of them is the original reason for tinyurl.com back in 2002:

    Are you sick of posting URLs in emails only to have it break when sent causing the recipient to have to cut and paste it back together?

    This was back in the times where a considerable amount of email users where using pure text mail with a line length of 80 characters.  Other reasons are limited line length in newspaper columns (typically around 30-40chars). Another place where line length is important are arcane newswire formats like IPTC7901 / ANPA1312 with a line length of 69 characters.

    Nowhere i’ve seen a real need for the extreme short URLs that would actually need one two or three character second level domains as they are used by centralized  URL shorteners.

    IMHO a 10 – 15 character second level domain is sufficient for that purpose. Using  Base62 encoding of the ID this it basically gives you around 90 million Base62 encoded with an ID length of 5.

    Any serious content-provider should already have a second-level-domain with 10-15 characters. If not they should be able to come up with a decent domain name that is not already taken.

    (Centralized) URL shorteners considered harmful

    In late April i started writing this post and it was lying around unfinished all these months. The impetus for the post were  a number of discussions i had whether URL shorteners are a good thing or not. (My position was along the lines of this BoingBoing post by Cory Doctorow)

    Given the (recent) developments with Digg, cli.gs and tr.im i finally decided to finish the post(s): Here is the short version (Update: I started to elaborate on the points below in a follow up post ):

    1. Centralized URL Shorteners are considered harmful
    2. Serious content providers simply cannot rely on them due to quality of service reasons
    3. The need for shorturls within Twitter is arbitrarily imposed by Twitter, and can and should easily be changed by Twitter
    4. There are cases where content provider have a real need for short URLs. But short in that cases means around 30 chars in the worst case (newspaper columns) not 15.
    5. Most of the content providers already can provide this kind of short urls using a second-level domain of 10 – 15 chars and the ID of the content within their CMS (or a Base62 encoded random ID)
    6. rev=”canonical” and/or rel=”alternate shorturl” are ways to announce that short URL to others that might be interested in your content. Use it and ask for support of this in your CMS (or build it)
    7. Realtime statistics are another (if not the) reason why centralized URL shorteners are used by many people. This kind of statistics should already be available in your CMS. If not you’re not really a serious content provider.
    8. If you really need real-time statistics for other peoples content, either use a redirecting URL on your own domain (as has been done for 10+years) or
    9. Run your own little URL shortener that is fully under your control , only writable to your staff and  integrated / integratable into your CMS.

    Thinking about how to solve the real-time statistics dilemma the real-time the following bold / crazy idea was born:

    What about the content providers opening  up  their access statistics themselves instead of having bit.ly etc. doing it for them.

    This would make sure that all people can see the complete picture and not only some potentially twisted fragment provided by an URL Shortener service. Most content providers  already have a most emailed, best rated etc. section on their pages. It would also provide a much needed transparency in a link  economy as well as provide a decentralized solution for an apparent need of the people using the web. They want to know about how they contribute to the success of

    Opening up the statistics like bit.ly and tr.im are doing it is IMHO the right way:

    • Everybody sees the summarized results for the page (The decision of the level of detail provided is up to the content provider)
    • Authenticated users can see the traffic that came from their registered  domains
    • Access via the web site and an API

    Some first implementation ideas:

    • Authentication via OpenID /OAuth
    • Domain ownership verification like Google is doing for Google Apps etc.
    • Encapsulatin of the whole thing into plugins for blogs software and CMS

    What do you think?