By: Tom Tague

Tom Tague — Fri, 25 Sep 2009 17:36:28 +0000

Gerd:

I’m not going to respond to every point you make – but I will make a few general comments and observations. Of course you should take these as my interpretations – unless revised the Calais TOS document itself is the correct reference.

DocumentCloud is a big deal for us. It not only leverages the technology we’ve deployed – but it serves a greater social good of supporting journalism and the free and open exchange of ideas and information.

It’s also unique. Issues of integrity of information, confidentiality and transparency will be integral to the success of the DocumentCloud project. That being said – I’m certain that OpenCalais, the DocumentCloud team and the many publishers involved will be having many discussions about how to accomplish these goals. Those discussions may well lead to an relationship model that is unique to the DocumentCloud project.

A few specific points.

Yes, OpenCalais does retain the metadata. Contrary to any rumor you might have heard we do not and never have retained any original content or claimed any rights to it. It’s your content. Period. No exceptions. Ever.

When you talk about our use of the metadata it’s important to make a basic distinction. OpenCalais retains metadata at two levels: the document and what I’ll refer to as “atomized” metadata – they’re two very different things.

Document level metadata is – obviously – all of the metadata associated with a specific document. We consider this metadata to be particularly confidential and never expose it to any other OpenCalais user. The only mechanism for another user to gain access to this metadata is by the content submitter sharing a secret key – specifically a GUID – with someone else. If they don’t share it their metadata is never exposed to other OpenCalais users.

You ask a fair question regarding what we’re doing with that metadata and the honest answer is – not much. At some point we’ll probably conduct some experiments such as looking for trends in co-occurrence of mentions of companies and other statistical examination – but that’s all that’s on the horizon at this point. When we reach some to-be-determined size threshold, there may be some interesting statistical insights we can glean that will be of value to us.

Atomic metadata on the other hand is widely shared. Let’s talk about what it is and how it’s shared. If you send us an article about, for example, mining – we’ll create the document level RDF (which will identify some companies and probably a lot of other things and store it away). We’ll then break the RDF up and extract entities from it – for example “Metalline Mining Company”. Those specific entities are then published in our Linked Data ecosystem at a unique URL. Here’s an example: http://bit.ly/1V0x7J As anyone familiar with the Linked Data standard knows this is the first step toward enhancing the value of your content assets using Linked Data resources.

As far as metadata retrieval – yes your interpretation is too strict. This isn’t about metadata extraction – all we ask is basically that you leave the OpenCalais GUIDs as is so that users are pointed back to our Linked Data store rather than some copy. That’s the only way we can ensure that the Linked Data references generated from OpenCalais are of high quality. Linked Data is a great thing – but we may end up in a situation with lots of dead or outdated links lying around – and we’d like to avoid that for OpenCalais users.

As far as bulk processing – DocumentCloud is clearly – to an enormous extent – adding value beyond scraping, tagging and republishing. They’re empowering more effective journalism. We’re absolutely good with their use of the service.

While I know I haven’t addressed each and every point you made I hope I’ve conveyed our general intentions and approach. OpenCalais has always striven to be transparent in our motivations, terms of service and privacy policies. With DocumentCloud we’ll continue with that transparent position while ensuring the OpenCalais service supports the unique needs of a large journalistic consortium.

Regards,

Tom Tague

By: Tweets die DocumentCloud and OpenCalais: Some Questions at ≈ Relations erwähnt -- Topsy.com

Fri, 25 Sep 2009 09:14:27 +0000

[…] Dieser Eintrag wurde auf Twitter von KP Frahm erwähnt. KP Frahm sagte: Reading btw: DocumentCloud and OpenCalais: Some Questions at ≈ Relations "Tag Cloud" http://bit.ly/IKqR6 […]

Comments on: DocumentCloud and OpenCalais: Some Questions

By: Tom Tague

By: Tweets die DocumentCloud and OpenCalais: Some Questions at ≈ Relations erwähnt -- Topsy.com