ACAP – The strawman proposals


In a couple of hours i’m going to attend the “International Conference on the Conclusion of the ACAP Pilot” at AP’s premises here in NYC. (This has given me the chance to see some of my AP friends :-) ).

For those of you who don’t know ACAP. According to the website:

Following a successful year-long pilot project, ACAP (Automated Content Access Protocol) has been devised by publishers in collaboration with search engines to revolutionise the creation, dissemination, use, and protection of copyright-protected content on the worldwide web.

ACAP is set to become the universal permissions protocol on the Internet, a totally open, non-proprietary standard through which content owners can communicate permissions for access and use to online intermediaries.

IMHO, ACAP is first and foremost a much needed “neutral ground” where publishers can meet, exchange ideas, and start joint lobbying efforts etc. Don’t get me wrong, i really think that such a place was missing, and hence i more or less talked my employer, the german news agency dpa, into becoming a ACAP member.

I also think that there is lack of a “standard” way to communicate commercial content rights to end-users and search-engines etc. Hence my primary reason for joining was to be able to have a closer look at the developed technology, in order to be able to judge whether this technology would contribute to close this gap. Hence i already have written a couple of times on the topic of ACAP on this blog.

ACAP – the promise

The following is stated on the invitation to the conference:

As ACAP reaches the final phase of its 12-month pilot, representatives of the
publishing and online community will be showcasing the successful development of
the new, open standard through which the owners of content published on the World
Wide Web can provide permissions information (relating to access and use of their
content) in a form that can be recognised and interpreted automatically, so that
search engine operators and other online intermediaries are enabled systematically
to comply with policies established by content owners.

ACAP will allow publishers, broadcasters and indeed any other publisher of content
on the network to express their individual access and use policies in a language that
search engine robots and similar automated tools can read and understand.

This conference will demonstrate beyond all doubt, the need for ACAP and the
potential disaster for the global publishing industry should it fail to embrace new
technology to protect its future.

Big words. So, later today a lot of publishing bigwigs will be at the conference, things will be announced, politicians are going to speak, and “i will be one of the few attendees that actually cared to read the technical documents”.

I’ve taken the time to read the:

  • Strawmans proposals (part I and II), and the
  • Usage definitions

Unfortunately i had only the Semptember documents with me. So i wasn’t able to check (until now), if there are any significant differences in the October version and /or the final versions, that just have been put up some 30h ago.

Too bad, that there are neither documents highlighting the edits between versions, nor some easy way to run a diff on the versions. (Hint: There is a reason why RFCs are still ordinary texts.)

So, to make it very clear. What follows is based on my loose acquaintance with the project (i.e. i attended the first conference in London) and a one-time thorough reading of the september draft specs (on a flight while having a terrible headache) . So i might be terribly wrong. If so please tell me.

ACAP – My verdict

While being very successful at organising a common platform for publishers, ACAP fails big time to convince me that the proposed technical solution is actually going to be the solution to problems as stated above.

And while i’m the first to agree that the publishing industry has to embrace new technologies in order to avoid potential disaster, i think ACAP carries more of a backward oriented, let’ s protect our territory, attitude than a forward looking, let’s explore new worlds thinking.

Single use case only

On one hand it is too focused on a single use case: Telling search engines what are they allowed to do with content residing on web sites.

What about readers? They also want to know what they are allowed to do with the content they are reading. Are they allowed to put the about them on their personal website, on their blog?

What about content not residing as HTML/XHTML pages on websites? News agencies still deliver their content mainly via wires (via satellite or FTP). What about RSS and ATOM feeds, the standard content delivery format in “the developed countries” of the internet. What about content reuse in Facebook / OpenSocial apps?

What about images, audio, video and a way to embed the rights into the original data?

Update: In the talks at the conference it became clear that there is some kind of roadmap for the other use cases, especially the syndication use case, which I’m most interested in my daily profession. This use case is going to be based on an XML format (which itself is to be based on the ONIX proposals). Hopefully this will be drastically reduced in complexity (see my earlier posts on ONIX). If anything works out well, i might even contribute to that format.

Wrong granularity

Instead of looking at theses broader issues , ACAP focuses on ways to define on a very fine level of granularity, what search engines are allowed to do with the content residing on webpages. This leads to the possible creation of very complex permission sets. Permission sets that lead to a cmputation order of complexity, that makes it practically (may be even theoretically) impossible to implement for search engines. Especially the permissions / restrictions defined on the present verb are very fine grained and lead to very complex renderings, that in addition, given the presumable striking differences between permission given by differente publishers, lead to visual disaster on the search result pages.

Bad technical quality of the specs

I tried to read the documentation the same way i did read term papers of students or research papers of colleagues while revieing the papers back in my teaching /researching days. That means:Trying very hard to understand what was written, scribbing remarks and question marks when i didn’t get it the first time reading, checking for completeness of the presentation, looking out for contradictions, self-containedness of all necessary information etc. You know it.

And i have to say that i can’t remember a term-paper /research-paper or thesis, that even in the earliest versions has been of such a bad technical quality as the September ACAP documentation. May be i’m getting old and do not remember correctly, and there have been some, but definitely not very many.

So i hope for an improvement in the final documentation, because in the september release the documentation fails miserably in fulfilling its self proclaimed primary requirement:

Fundamentally, ACAP requires consistent and unambiguous interpretation of all its
permissions.

Update: I now had the chance to look at the 1.0 specs and things definitely look better. In talks with the project participants it als became clear that (as usual) the docs had to been pushed out on a rush and other more refined documentation is on its way. I also was able to get the basic question if the basic model is permissive or restrictive, an information that is missing from the documents. It is permissive like the model of robots.txt. But this may change with a 2.0 version when ACAP is no longer that closely intertwined with the REP.

Where do go from here

A year ago, directly after the ACAP announcement i wrote the following on this blog:

As often noticed in there is already a de facto standard protocol (the robots exclusion protocol) which is machine readable and that tells search engines which content (not) to spider. So if a newspaper wants a search engine not to index her pages all they have to do is to include an appropriate robots.txt file. Furthermore there are also machine readable means (e.g. the creative commons license framework) for automatically communicating the terms under which a content can be used.

Unfortunately the the robot exclusion protocol is not an “official” standard e.g. by the W3C , and the “creative commons” framework doesn’t cover possibilities to list exceptions to the various restrictions imposed by the license or in some way ease the way of waiving the restrictiond by (semi-)automatically getting the permission from the rightsholder.

So there definitely is room for improvement on both. As long as ACAP builds on these lightweigth and broadly accepted standards, i’m interested in it. In might be useful and it might even be used.

Looking at these sentences today, i have to say that at least ACAP tried to build on the REP. But by broadening their stated goals to a solution to the whole publishing industry, a REP based solution is definitely not enough (see above).

And they completely neglected creative commons, IMHO a major mistake.

Creative Commons – the better ACAP?

CC tries to define different common use cases on the scale from “All rights reserved” to “Public domain”, leaning definitely in the half where more rights are granted than reserved. Typically the rights publishers have in mind are traditionally in the other half, making a perfect complementary fit.

And years and years of development have gone into supporting tools for CC, search engine enhancements supporting CC etc, not to mention all the work that has been spent in adapting the licenses to the local jurisdictions.

In addition to that publishers sooner or later will publish cc-ed content, so they have to know and implement cc in their processes.

Hence, IMHO building on top of CC would have been definitely the better way to create ACAP. But i guess that this way was politically not feasible for the publishing industry.

Update:  I was happy to hear that ACAP is going to talk to Creative Commons soon and also recognizes that creative Commons is especially interesting in the non search engine use cases.

PS.: I try to write a second post with a technical critique of the September ACAP documentation. But since i’m leaving for a 2 1/2 weeks holiday tomorrow, i’m not sure if this is going to happen soon