PicApp: Access to rights-cleared high-quality editorial photos of news for everyone

Political Parties React To Federal Elections
Via the wordpress.com deal (via RWW) i learned about PicApp (Shame on me that i didn’t learn earlier about them).

So  what is PicApp? Basically they are trying to overcome an obstacle that until now (at least IMHO)  hindered the proliferation of user generated journalism: Access to rights-cleared  access to high-quality editorial photos of breaking news.

Basically the deal is the following. Everybody is getting access to 20+ million photos from Getty, Corbis and a number of other high quality news agencies. After signing up you are allowed to post these photos on your blog (see above). In exchange PicApp is overlaying the image with a viral element (that will also include ads).  In my opinion, the overlay could be a bit less intrusive but it is still ok.

There is also wordpress plugin that makes the search and inclusion of photos in  self-hosted wordpress blogs easy. It is a little bit buggy but a new version is already under way. All in all it took me about 2 minutes from learning about PicApp until i had the image from above included in the first draft of this post. Not too shabby.

DocumentCloud and OpenCalais: Some Questions

Recently another KnightNews challenge winner announced the availability of the open source version of the code. This time DocumentCloud, a joined effort of NYT and ProPublica (not sure why these bigshots need grant money to do these things, but this is another story). It is opensourcing CrowdCloud which claims:  Parallel processing  for the rest of us.

Yesterday it announced another two-dozenhigh profile content partners (Nieman Labs view on this) as well as a partnership with ThomsonReuters OpenCalais (DocumentCloud Blog Post):

This morning we’re excited to announce a partnership with Thomson Reuters, which is contributing its OpenCalais service to DocumentCloud. OpenCalais uses natural language processing to extract information from documents, instantly identifying and tagging the relevant people, places, companies, facts and events. This will make it easy for readers and journalists to explore connections between documents and across the full collection of source materials.

I’m very excited to use DocumentCloud / CrowdCloud but i have a couple of questions regarding the OpenCalais Terms of Service. Since i’m not sure when they’ll make it through moderation, i’m reposting them here:

Can you (and Thomson Reuters) please clarify if you are using the public free version of OpenCalais. If so it would be very helpful to get your reading of the terms of service. Until now the terms were the reason that i’m very hesistant to use OpenCalais for tasks at my news org.

Since i would very much love to use OpenCalais and DocumentCloud it would be very helpful for me to get more information on your interpretation to the following parts of the terms:

1. As far as I understand the Terms of the service of the public version not only allow Reuters to keep and use the metadata (with some rumour that the full text is part of the metadata).

You understand that Thomson Reuters will retain a copy of the metadata submitted by you or that generated by the Calais service. By submitting or generating metadata through the Calais service, you grant Thomson Reuters a non-exclusive perpetual, sublicensable, royalty-free license to that metadata.”

Since Reuters somehow has to refinance the operation of OpenCalais i’m basically fine with that clause but it would be interesting to know about the types of services and products they are sublicensing the metadata to.

2. IMHO the terms make it at least difficult to use other Metadata extraction means e.g Homegrown NLTK / GATE jobs, Metacarta API, inxight, empolis, … etc. or offering this metadata as part of your own api e.g. the NYT API.

“# If you syndicate, publish or otherwise transmit any content containing, enhanced by or derived from Calais-generated metadata you will use your best efforts to incorporate the correct Calais-provided Globally Unique Identifier (GUID) in that content. You specifically agree not to attach incorrect GUIDs to your content with any intent to mislead, spam, spoof, phish or otherwise deceive downstream consumers of your content.

# You will not use any metadata or GUIDs produced by Calais to create a metadata retrieval service similar to Calais. To ensure the quality of metadata for all Calais users we want to maintain a single verifiable metadata storage location.”

I read these clauses such that e.g.the NYT Times API (as among other things a metadata retrieval service for people, persons and places) is not allowed to use the public OpenCalais service as part of its processing. Is my interpretation too strict? I’m basically talking about open Calais as a preprocessing step where the results would be curated by human beings

BTW: The last sentence of that quote looks very strange to me give the “Linked Open Data” initiative, including Freebase, DBPedia etc which all provide their ow GUIDs.

3. One clause of the terms for me looks like DocumentCloud is in direct violation of it:

“You will not do bulk processing where you are adding minimal value beyond adding Calais metadata to the content. For example – if you are a webcrawler you should not send everything to Calais before sending it to your users.”

Since DocumentCloud is all about bulk processing: Was this claused waived for DocumentCloud (including all uses of DocumentCloud in outside the original partners installation? E.g. Systems  derived from DocumentCloud / github clones, … Or does it mean that i cannot do only Metadata annotation on a DocumentCloud job but have to do some other things in the same job too?

I hope that most if not all of these questions have already been asked and answered by the various content partners and it’s easy for you to answer them.

Quotes from: What The Future Will Look Like For Journalists | paidContent

Interesting piece from Jim Spanfeller, the CEO of forbes.com on paidContent. Some quotes:

But I firmly believe that in the future we will need more professional journalists than we have today and they will be as valued—or perhaps even more highly valued—than they were 10 years ago.

Will these professionals work for the same institutions that they work for now? More likely no then yes. Certainly some of our current journalistic enterprises will survive and thrive but only the ones that make the transition to a “now economy” that demands “entwined content,” or stories told in prose, video and data all at the same time. The majority of the current kings of content don’t understand these changes or perhaps they do but feel helpless to respond to them.

But the idea of a “scoop” having great value is gone. In an internet-enabled world, a scoop lasts for only a very fleeting period of time. The real value is the insight about that scoop. And because the web is multimedia, video will be extremely important too.

The world has changed, yes, but at the end of the day, people are still, well, people. They still have a need to know what is going on around them and how it may affect them. We have the tools to meet these needs, but unfortunately most of the legacy distributors of news have not been able to use them. Either they are too overwhelmed by the destruction of their current models or they are too leveraged with debt—or, in some cases, both— to see the opportunities within all the change.

What The Future Will Look Like For Journalists | paidContent.