Head-To-Head: ACAP Versus Robots.txt For Controlling Search Engines

Danny Sullivan put up a great (and very long) post comparing ACAP and Robots.txt in the Context of the current discussion around paid content and the Hamburg declaration. I urge you to read it in full if you want to know more about the current situation, and why ACAP will not help the publishers to pursue their hamburg declaration goals.

Disclosure: After some critical posts regarding ACAP and due to the fact that i’m working at a news agency i was invited to join the ACAP technical working group. I attended one face-to-face meeting and a couple of phone conferences, mainly there was interest to integrate news agency use cases into ACAP.  I stopped active work in the TWG basically a year ago, mainly due to the following reasons:

  • dpa has no B2C business and FTP / satellite not HTTP are still the major news delivery mode :-(
  • The general situation regarding ACAP is as Danny describes it
  • Hence there are more efficient uses of my precious time than the ACAP TWG

To give you an idea what Danny is talking about i’ll include a quote fromhis post showing that even the protagonists are not really using ACAP :

Sounds easy enough to use ACAP, right? Well, no. ACAP, in its quest to provide as much granularity to publishers as possible, offers what I found to be a dizzying array of choices. REP explains its parts on two pages. ACAP’s implementation guide alone (I’ll get to links on this later on) is 37 pages long.

But all that granularity is what publishers need to reassert control, right? Time for that reality check. Remember those 1,250 publishers? Google News has something like over 20,000 news publishers that it lists, so relatively few are using ACAP. ACAP also positions itself as (I’ve bolded some key parts):

an open industry standard to enable the providers of all types of content (including, but not limited to, publishers) to communicate permissions information (relating to access to and use of that content) in a form that can be readily recognized and interpreted by a search engine (or any other intermediary or aggregation service), so that the operator of the service is enabled systematically to comply with the individual publisher’s policies.

Well, anyone with a web site is a publisher, and there are millions of web sites out there. Hundreds of millions, probably. Virtually no publishers use ACAP.

Even ACAP Backers Don’t Use ACAP Options

Of course, there’s no incentive to use ACAP. After all, none of the major search engines support it, so why would most of these people do so. OK, then let’s look at some people with a real incentive to show the control that ACAP offers. Even if they don’t yet have that control, they can still use ACAP now to outline what they want to do.

Let’s start with the ACAP file for the Irish Independent. Don’t worry if you don’t understand it, just skim, and I’ll explain:

##ACAP version=1.0

# Allow all

User-agent: *

Disallow: /search/

Disallow: /*.ece$

Disallow: /*startindex=

Disallow: /*from=*

Disallow: /*service=Print

Disallow: /*action=Email

Disallow: /*comment_form

Disallow: /*r=RSS

Sitemap: http://www.independent.ie/sitemap.xml.gz

# Changes in Trunk

ACAP-crawler: *

ACAP-disallow-crawl: /search/

ACAP-disallow-crawl: /*.ece$

ACAP-disallow-crawl: /*startindex=

ACAP-disallow-crawl: /*from=*

ACAP-disallow-crawl: /*service=Print

ACAP-disallow-crawl: /*action=Email

ACAP-disallow-crawl: /*comment_form

ACAP-disallow-crawl: /*r=RSS

OK, see that top part? Those are actually commands using the robots.txt syntax. They exist because if a search engine doesn’t understand ACAP, the robots.txt commands serve as backup. Basically those lines tell all search engines not to index various things on the site, such as print-only pages.

Now the second part? This is where ACAP gets to shine. It’s where the Irish Independent — which is part of the media group run by ACAP president Gavin O’Reilly — gets to express what they wish search engines would do, if they’d only recognize all the new powers that ACAP provides. And what do they do? EXACTLY the same blocking that they do using robots.txt.

So much for demonstrating the potential power of ACAP.

The End Of The CrunchPad | Techcrunch

I was watching the Crunchpad for quite some time (see e.g. here). It was the device that came closest to my dream surfing/ereading device (besides an Apple Tablet with a PixelQi screen). Now Michael Arrington announces it’s end, due to “interesting circumstances” to say the least. Given the nature of TechCrunch i’l take his version with big grains of salt but at least it is an entertaining read. Also of interest: the raise of the estimated preice from $200 to $300. But read for yourself

Our plan was to debut the CrunchPad on stage at the Real-Time Crunchup event on November 20, a little over a week ago. We even hoped to have devices hacked together with Google Chrome OS and Windows 7 to show people that you could hack this thing to run just about anything you want. We’d put 1,000 of the devices on pre-sale and take orders immediately. Larger scale production would begin early in 2010.

And then the entire project self destructed over nothing more than greed, jealousy and miscommunication.

On November 17, our deadline date for greenlighting the debut three days later, the CEO of our partner on the project, Chandra Rathakrishnan, sent me an email with the subject “no good news.” Yuck, I thought. Another delay, probably with the screen that had been giving us so much trouble – capacitive touch at 12 inches isn’t trivial. And sure enough, the email started off with “no good news to update. updated hardware is still on its way , so that’s a timing issue. friday will be a challenge now.”

But the email went on. Bizarrely, we were being notified that we were no longer involved with the project. Our project. Chandra said that based on pressure from his shareholders he had decided to move forward and sell the device directly through Fusion Garage, without our involvement.

Err, what? This is the equivalent of Foxconn, who build the iPhone, notifying Apple a couple of days before launch that they’d be moving ahead and selling the iPhone directly without any involvement from Apple.

Chandra also forwarded an internal email from one of his shareholders. My favorite part of the email: “We still acknowledge that Arrington and TechCrunch bring some value to your business endeavor…If he agrees to our terms, we would have Arrington assume the role of visionary/evangelist/marketing head and Fusion Garage would acquire the rights to use the Crunchpad brand and name. Personally, I don’t think the name is all that important but you seem to be somewhat attached to the name.”

And with that, the entire project self destructed.

via The End Of The CrunchPad.