Newspapers and syndication – Part I: ACAP ASAP?

Originally i had planned to write about my experiences with the NYT Reader (not to bad) and Microsoft Max Feedreader (really bad) this week.

But then monday i read that the world association of newspapers (WAN), together with the European Publishers Council (E.P.C.) the International Publishers Association (I.P.A.) and the European Newspapers Association (E.N.P.A) is setting up a new project called ACAP (Automatic Content Access Protocol).

This was the final trigger to write a series of articles that summarizes my views on the issue of newspapers and syndication This is part one, a more technical view focusing on ACAP. The second part will then be a more general one looking at the attitude of newspapers and other main stream media (MSM) towards lightweight syndication.

ACAP – The protocol everybody “Has Just been waiting for”

From the WAN press release:

“The new project, called ACAP (Automated Content Access Protocol), is an automated system which allows online content providers to systematically provide information about access and use of their content to news aggregators and others on the web. The information, provided in a form that can be recognised and interpreted by search engine “crawlers”, will tell search engine operators and other users under what terms they can use the content.

ACAP will be a technical solutions framework that will allow publishers worldwide to express use policies in a language that the search engine’s robot “spiders” can be taught to understand.

“This system is intended to remove completely any rights conflicts between publishers and search engines. Via ACAP, we look forward to fostering mutually beneficial relationships between publishers of original content and the search engine operators, in which the interests of both parties can be properly balanced,” said Gavin O’Reilly, President of the World Association of Newspapers, one of the partners in the project.

“Importantly, ACAP is an enabling solution that will ensure that published content will be accessible to all and will encourage publication of increasing amounts of high-value content online,” he said. “This industry-wide initiative positively answers the growing frustration of publishers, who continue to invest heavily in generating content for online dissemination and use.””

The following reuters article sheds some more light on the intensions of the newspapers:

“The pilot program stems from the huge popularity of search engines, which automatically return search results from newspapers, magazines and books, and usually link back to a publication’s own Web site for users to read a whole item.

Many publishers feel, however, that the search engines are becoming publishers themselves by aggregating, sometimes caching and occasionally creating their own content.

…

In one example of how ACAP would work, a newspaper publisher could grant search engines permission to index its site, but specify that only select ones display articles for a limited time after paying a royalty.

…

The cost of the project, known as the Automated Content Access Protocol, was not disclosed, though the publishers have budgeted 310,000 pounds ($583,700) to seek advice from third-party experts.”

Given the fact that the WANs pressrelease starts with:

“In the week that Belgian publishers won their case against Google for illegally publishing content without prior consent”

and ends with:

“ACAP will be presented in more detail at the forthcoming Frankfurt Book Fair on 6th October and will be launched officially by the end of the year.”

one could get the impression that the press release was rushed out of the door two weeks early in order to ride on the momentum generated by the ruling.

Hence no further technical information about ACAP is available. Lets do some guessing how i could look like

Current de facto standards and their room for improvement

As often noticed in there is already a de facto standard protocol (the robots exclusion protocol) which is machine readable and that tells search engines which content (not) to spider. So if a newspaper wants a search engine not to index her pages all they have to do is to include an appropriate robots.txt file. Furthermore there are also machine readable means (e.g. the creative commons license framework) for automatically communicating the terms under which a content can be used.

Unfortunately the the robot exclusion protocol is not an “official” standard e.g. by the W3C , and the “creative commons” framework doesn’t cover possibilities to list exceptions to the various restrictions imposed by the license or in some way ease the way of waiving the restrictiond by (semi-)automatically getting the permission from the rightsholder.

So there definitely is room for improvement on both. As long as ACAP builds on these lightweigth and broadly accepted standards, i’m interested in it. In might be useful and it might even be used.

Will ACAP be the usual “heavyweight” type of standard defined by the news industry?

Given my experiences with “news industry standards”, i fear that ACAP is going to be a bloatware standard. A standard that tries to foresee even the remotest possibilities. A standard that tries to be self contained. A standard that is rooted in B2B environments, fully buzzword enabled, supporting a MDA for building a SOA based on SOAP/WSDL, discoverable by UDDI, you name it. All the buzzwords that make systems integrators happy, and implementation complex and costly so that only the “serious” players in the marketplace can actually affor to implement and host it.

This fear is furthered by the fact that WAN et. al. announce that “This system is intended to remove completely any rights conflicts between publishers and search engines.”

Heavyweight vs. Lightweight Protocols

The ACAP announcement instantly reminded me of the Information and Content Exchange Protocol (ICE). First time i heard about this protocol and the bright future it provided for all (, especially content providers) was in 1998. But i never ever heard of anybody using this protocol so i lost track of it. ACAP provided a reason to revisit the site:
The latest news on the ICE homepage is the announcement of the ICE 2.0 (dated from August 2004):

“Unlike lightweight syndication protocols, we have designed ICE 2.0 to support industrial-strength content syndication. ICE 2.0 is the only XML-based Web syndication protocol that provides for subscription management, verification of delivery, and scheduled delivery in both push and pull modes.

Lightweight syndication protocols, such as RSS, have proven quite useful for the distribution of links-free content, but remain limited in their ability to enforce business rules. ICE is the protocol for syndicators who distribute ‘valued content’ intended to generate a revenue stream or who require guaranteed delivery in a secure environment.

Development of the ICE2 specification was an open industry activity. ICE 2.0 was released for comment during the News Standards Summit, co-sponsored by IDEAlliance, IFRA (INCA-FIEJ Research Association), IPTC (International Press Telecommunications Council), NAA (Newspaper Association of America), and OASIS. Here, ICE 2.0 was introduced to more than 75 major players in the news industry and comments were solicited.”

So similar organization tried to define a standard for a similar task in a similar context. (For the records: I’ve just checked with the ICE2.0 specs: the business agreement is out of scope of the standard and has to be done manually.)

Hence i consider it a reasonable method to use the development of ICE2.0 vs. Lightweight Protocols over the past two years as the base for a prediction how a ACAP vs. lightweigth protocols will fare out:

ICE2.0 on one hand, remained irrelevant (if there are some B2B niches where it is used i like to hear about them). Lightweight syndication protocols on the other hand, really took off big-time, creating a whole ecosystem around them:

Starting with the first podcast clients that were available in september 2004, and using enclosures to wrap audio files, podcasts and videocasts have become a standard way to distribute audio and video news on the internet. (BTW: in contrast to ICEs wording they never have been links-free, enclosures are around AFAIR since 2001)
With ATOM a lightweight syndication format has been standardized by the W3C, with the ATOM API following in the not so far future and already usabable and used. Every Blog and every reasonable “Web 2.0” system as well as a lot of enterprise systems use RSS and ATOM heavily as the de facto standard for fine-granular syndication of all kinds of content, not only news.
Google is using ATOM as the base of it’s Gdata protocol, which in course is the inerchange and syndication format used within Google as well as for the interaction with the Google customers.
Feedreaders like Newsgator, Pluck, SAGE, … as well as web2.0 startpages like netvibes, pageflakes, … and the portal pages of the GYM triumvirat are used by an ever increasing number of web-savvy users and opinion leaders to select and aggregate the content they want to read. OPML (see my recent posting on grazr) and the netvibes (unfortunately i had no time to write about my current favourite start page) tabs allow to easily share not only the content but the aggregations. I’m sure that future products will bring the usability and the usage from the early adopters to the mass markets.
Advertisements in Feeds are starting to getting traction the first marketeers (e.g. feedburner) are already there.

IMHO Ilightweight (or call it simple) protocols generally are more successful when it comes to the “general” internet. Other examples are ReST vs. SOAP as the underlyings of general accessivle APIs. XML-RPC vs. SOAP as the basis for exposing APIS to blogs and wikis, …

Summary

Given the “historical” evidence, it is likely that the ACAP standard will be a heavyweight, B2B-oriented, IT systems integrator pleasing type of standard. And hence it is also very likely that it will not gain any traction.

I may be wrong suspecting a heavyweight protocol. But then i’m going to be pleaseantly surprised.

Now on to the general theme of newspapers and syndication.

Update (Oct. 10th): After gettting some more information about ACAP i decided to do a “Part Ib: ACAP – the ‘details’ prior to Part II