Head-To-Head: ACAP Versus Robots.txt For Controlling Search Engines


Danny Sullivan put up a great (and very long) post comparing ACAP and Robots.txt in the Context of the current discussion around paid content and the Hamburg declaration. I urge you to read it in full if you want to know more about the current situation, and why ACAP will not help the publishers to pursue their hamburg declaration goals.

Disclosure: After some critical posts regarding ACAP and due to the fact that i’m working at a news agency i was invited to join the ACAP technical working group. I attended one face-to-face meeting and a couple of phone conferences, mainly there was interest to integrate news agency use cases into ACAP.  I stopped active work in the TWG basically a year ago, mainly due to the following reasons:

  • dpa has no B2C business and FTP / satellite not HTTP are still the major news delivery mode :-(
  • The general situation regarding ACAP is as Danny describes it
  • Hence there are more efficient uses of my precious time than the ACAP TWG

To give you an idea what Danny is talking about i’ll include a quote fromhis post showing that even the protagonists are not really using ACAP :

Sounds easy enough to use ACAP, right? Well, no. ACAP, in its quest to provide as much granularity to publishers as possible, offers what I found to be a dizzying array of choices. REP explains its parts on two pages. ACAP’s implementation guide alone (I’ll get to links on this later on) is 37 pages long.

But all that granularity is what publishers need to reassert control, right? Time for that reality check. Remember those 1,250 publishers? Google News has something like over 20,000 news publishers that it lists, so relatively few are using ACAP. ACAP also positions itself as (I’ve bolded some key parts):

an open industry standard to enable the providers of all types of content (including, but not limited to, publishers) to communicate permissions information (relating to access to and use of that content) in a form that can be readily recognized and interpreted by a search engine (or any other intermediary or aggregation service), so that the operator of the service is enabled systematically to comply with the individual publisher’s policies.

Well, anyone with a web site is a publisher, and there are millions of web sites out there. Hundreds of millions, probably. Virtually no publishers use ACAP.

Even ACAP Backers Don’t Use ACAP Options

Of course, there’s no incentive to use ACAP. After all, none of the major search engines support it, so why would most of these people do so. OK, then let’s look at some people with a real incentive to show the control that ACAP offers. Even if they don’t yet have that control, they can still use ACAP now to outline what they want to do.

Let’s start with the ACAP file for the Irish Independent. Don’t worry if you don’t understand it, just skim, and I’ll explain:

##ACAP version=1.0

# Allow all

User-agent: *

Disallow: /search/

Disallow: /*.ece$

Disallow: /*startindex=

Disallow: /*from=*

Disallow: /*service=Print

Disallow: /*action=Email

Disallow: /*comment_form

Disallow: /*r=RSS

Sitemap: http://www.independent.ie/sitemap.xml.gz

# Changes in Trunk

ACAP-crawler: *

ACAP-disallow-crawl: /search/

ACAP-disallow-crawl: /*.ece$

ACAP-disallow-crawl: /*startindex=

ACAP-disallow-crawl: /*from=*

ACAP-disallow-crawl: /*service=Print

ACAP-disallow-crawl: /*action=Email

ACAP-disallow-crawl: /*comment_form

ACAP-disallow-crawl: /*r=RSS

OK, see that top part? Those are actually commands using the robots.txt syntax. They exist because if a search engine doesn’t understand ACAP, the robots.txt commands serve as backup. Basically those lines tell all search engines not to index various things on the site, such as print-only pages.

Now the second part? This is where ACAP gets to shine. It’s where the Irish Independent — which is part of the media group run by ACAP president Gavin O’Reilly — gets to express what they wish search engines would do, if they’d only recognize all the new powers that ACAP provides. And what do they do? EXACTLY the same blocking that they do using robots.txt.

So much for demonstrating the potential power of ACAP.