(Centralized) URL Shorteners considered harmful (Part II)


This is a follow up post on this post, detailing  bullet points 1 – 4 .

Reasons for not having centralized URL shorteners

Back on May 1st  using short urls on twitter were really taking off (at least in my perception). Tinyurl was the default shortener used by twitter. bit.ly only had only gotten my attention because of their use of OpenCalais (which causes some questions of it’s own, definitely worth a separate post.). I wasn’t aware of tr.im and the other url shorteners that were available at that time.

It was a time

  • before Digg decided to first fuzz around with the shorturls generated by the DiggBar (Daring Fireball)
  • and then do a redirect to a page on it’s own domain in case a user is not locked in (DownloadSquad, Mashable, …).
  • before cli.gs got hacked and 2.2million links redirected to an out of context page, losing about 1 and a half month of links while restoring because of missing daily backups.
  • It was before  tr.im decided to pull the plug on it’s shortening service yesterday and then resurrect it today.

In the discussions with my colleagues and friends i cited all these scenarios as reasons why centralized URL shorteners should be considered harmful. In addition there are at least the following reasons:

  • Centralized URLShorteners are SPOFs (Single Points Of Failure)
  • They add unnecessary complexity to the web (unnecessary redirects and hops)
  • Out of commercial necessity a provider might decide to redirect to an interstitial ad instead of original ShortURL target (this was actually proposed by some of the potential buying parties interested in tr.im)
  • In the worst case a provider might decide to redirect to content that is out of context or just outright illegal (think ultimately child porn)
  • Shorturls for content providers

    Due to this multitude of reasons every serious content provider IMHO has to do two things:

  • First it has to evaluate if shorturls are really needed in his case
  • If he comes to the conclusion that this is the case, the content provider MUST use a shorturl service that is completely under his control and ensures that the shorturls live at least  as long as the original content they link to.
  • Lets have a closer look :

    Shorturls and artificial restrictions: Twitter

    Most of the shorturls are out there for a single reason: Twitter

    The first thing to note is that the need for shorturls triggered by using twitter is an artificial need that is imposed by twitter. The only thing twitter has to do is to extend the data model for links and allow a single or maybe two separate field(s) allowing for an URL of arbitrary length. Since most of the twitter traffic is generated  by using clients via the API and these clients already include functionality for generating short links, map links, twitpic links etc., all theses clients have to do is to change the API call to twitter.

    Interestingly  Dave Winer comes to the same conclusion. I’ve very rarely seen a tweet that contains more than one shorturl (ok, i’ve done so myself :-().

    Shorturls because of real restrictions: SMS

    One of the original reason(s) for the 140char limit of Twitter was the integration of Twitter with SMS. But this is IMHO not  longer the case (and never has been), since arbitrary characters outside the 7-bit SMS charset are allowed within the tweet. If these characters are used,  the  amount of chars within a single SMS is reduced to 70 (if non SMS charset chars are used the text is encoded in UCS-2 which is a 16-bit charset).

    Moreover, nobody (at least in germany) is using the SMS gateway, and if only SMS chars are used the limit of chars within a SMS is 160 (= 140byte / 7 bit) which gives Twitter enough room to run it’s own shortening service only in case the SMS gateway is used.

    Shorturls because of real restrictions: EMail, Newspaper columns, IPTC7901/ANPA1312

    Besides Twitter, there are other scenarios where the length of the URL really matters for content  providers. AFAIK they are all related to line length. One of them is the original reason for tinyurl.com back in 2002:

    Are you sick of posting URLs in emails only to have it break when sent causing the recipient to have to cut and paste it back together?

    This was back in the times where a considerable amount of email users where using pure text mail with a line length of 80 characters.  Other reasons are limited line length in newspaper columns (typically around 30-40chars). Another place where line length is important are arcane newswire formats like IPTC7901 / ANPA1312 with a line length of 69 characters.

    Nowhere i’ve seen a real need for the extreme short URLs that would actually need one two or three character second level domains as they are used by centralized  URL shorteners.

    IMHO a 10 – 15 character second level domain is sufficient for that purpose. Using  Base62 encoding of the ID this it basically gives you around 90 million Base62 encoded with an ID length of 5.

    Any serious content-provider should already have a second-level-domain with 10-15 characters. If not they should be able to come up with a decent domain name that is not already taken.