≈ Relations

Random Rants and Ramblings about Media and/or Technology

Archive for the ‘Noteworthy’ Category

More on tablets

leave a comment

As (nearly) anybody else working in media i  am discussing its implications nearly every day at work. Here’s the management summary of  my current state of mind:

  • Tablets will be gamechangers  both for personal media consumption and personal computing (if executed right)
  • Neither re-enacting /emulating existing media (especially in the newspaper and magazin space) nor re-enacting existing personal computing user experiences will do the trick. “Enhanced editions” of all kinds where one of the most talked about topics at O’Reilly’s Tools of Change for Publishing conference in late February.
  • Most media companies have a long way to go until they are tablet-ready. And that means wrt. enabling their existing processes and content, not wrt. building exciting apps. The prior are a necessary precondition for the latter.
  • It is nearly impossible to do development for tablets without a physical device. Because good applications wil be based on making the experience as physical  as possible. And that means integrating sensors of all kind as well as direct manapulation interfaces

Apples Human Interface Guidelines

Carefully reading  the iPad Human Interface Guidelines  is presumably the best thing you can do in order prepare for developing iPad applications. Thankfully uxmag published the overview  of these guidelines first. Hence i’m feeling quite confident that i might publish them over here without getting litigated from Apple (technically even these guidelines  are under NDA and only available to members of the iPhone / iPad developer program):

Support All Orientations

Your application should encourage people to interact with iPad from any side by providing a great experience in all orientations. The reason is that people don’t view the device as having a default orientation, because they don’t pay much attention to the minimal device frame and they’re unconcerned with the location of the Home button.

Enhance Interactivity (Don’t Just Add Features)

The best iPad applications give people innovative ways to interact with content while they perform a clearly defined, finite task. Resist the temptation to fill the large screen with features that are not directly related to the main task. In particular, you should not view the large iPad screen as an invitation to bring back all the functionality you pruned from your iPhone application.

Flatten Your Information Hierarchy

Although you don’t want to pack too much information into one screen, you also want to prevent people from feeling that they must visit many different screens to find what they want. In general, focus the main screen on the primary content and provide additional information or tools in an auxiliary view, such as a popover.

Reduce Full-Screen Transitions

Instead of swapping in a whole new screen when some embedded information changes, update only the areas of the user interface that need it. When you perform fewer full-screen transitions, your application has greater visual stability, which helps people keep track of where they are in their task.

Enable Collaboration and Connectedness

Think of ways people might want to use your application with others. Expand your thinking to include both the physical sharing of a single device and the virtual sharing of data.

Add Physicality and Heightened Realism

Whenever possible, add a realistic, physical dimension to your application. The more true to life your application looks and behaves, the easier it is for people to understand how it works and the more they enjoy using it.

Delight People with Stunning Graphics

The high-resolution iPad screen supports rich, beautiful, engaging graphics that draw people into an application and make the simplest task rewarding.

De-emphasize User Interface Controls

Help people focus on the content by designing your application UI as a subtle frame for the information they’re interested in. Downplay application controls by minimizing their number and prominence. Consider creating custom controls that subtly integrate with your application’s graphical style. In this way, controls are discoverable, but not too conspicuous.

Minimize Modality

iPad applications should allow people to interact with them in nonlinear ways. Modality prevents this freedom by interrupting people’s workflow and forcing them to choose a particular path.

Rethink Your Lists

Consider a more real-world vision of your application. For example, on iPhone, Contacts is a streamlined list, but on iPad, Contacts is an address book with a beautifully tangible look and feel.

Consider Multifinger Gestures

The large iPad screen provides great scope for multifinger gestures, including gestures made by more than one person.

Consider Popovers for Some Modal Tasks

If you use modal views to enable self-contained tasks in your iPhone application, you might be able to use popovers instead.

Restrict Complexity in Modal Tasks

People appreciate being able to accomplish a self-contained subtask in a modal view, because the context shift is clear and temporary. But if the subtask is too complex, people can lose sight of the main task they suspended when they entered the modal view.

Downplay File-Handling Operations

Although iPad applications can allow people to create and manipulate files and share them with a computer (when the device is docked), this does not mean that people should have a sense of the file system on iPad.

Ask People to Save Only When Necessary

People should have confidence that their work is always preserved unless they explicitly cancel or delete it. If your application helps people create and edit documents, make sure they do not have to take an explicit save action.

Start Instantly

iPad applications should start as quickly as possible so that people can begin using them without delay.

Always Be Prepared to Stop

Like iPhone applications, iPad applications stop when people press the Home button to open another application.

Adobe and HPs (lack of) vision

Meanwhile Adobe and HP join forces and publish a video previewing Adobe running on a HP Slate. Not surprisingly they are focusing on the “full web” experience that is enabled by runing a “real” operating system running “real” web sites and “real” applications (based on Adobe AIR).

Meaning: Cluttered OS build for a different kind of system (PC/Laptop) using a different user interface metaphor (Desktop/Mouse) showing websites optimized for both.

IMHO especially the photoshop.com demo in the video below clearly shows the lack of vision and violation of the “De-emphasize User Interface Controls” Guideline from above.

PS.: More dissemination of the video (as well as an alternative takeon it) can be found at crunchgear

Written by gkamp

March 9th, 2010 at 8:41 am

Posted in Noteworthy

Tagged with , ,

Abendblatt und das “Google-” resp. “Googlebotloch”

3 comments

Momentan überschlagen sich die Tweets mit Hinweisen darauf wie denn das Abendblatt auch kostenlos zu lesen ist.  Auch ich habe in meinem letzten Post darauf  hingewiesen.

Allerdings zeugt die Häme die dort zum Teil ausgeschüttet wird auch häufig von der Unkenntnis der Situation. Daher hier eine kurze Erklärung und meine Einschätzung.

Das Googleloch und das Googlebotloch sind alte Bekannte. Jeder der schon mal ernsthaft  das WallStreet Journal lesen wollte kennt zumindest das Google-Loch. Im folgenden will ich kurz erklären was die Gründe für diese Löcher sind und das das Abendblatt dies Löcher leicht stopfen könnte und es im Grunde nur eine Frage der Zeit resp. der kaufmännischen Abwägung ist ob und wann diese geschlossen werden.

Das Google-Loch und First-Click-Free

Das Google-Loch entsteht dadurch, dass die Verlage oder sonstige Content-Provider (zumindest die, die halbwegs bei Sinnen sind) nicht auf den Traffic aus der Google-Suche und Google-News verzichten wollen.

Um dies auch für Paid-Content machen zu können gibt es die First-Click-Free-Regelung von Google. Diese stellt im wesentlichen eine Ausnahme von der allgemeinen “Cloaking”-Regel dar die Aussagt, dass den Endkunden und dem Google-Crawler (der sich als googlebot identifiziert) nicht unterschiedliche Seitenversionen ausgeliefert / angezeigt werden dürfen.

Bis zum 1. Dezember besagte die First-Click-Regelung, dass dies für jeden ersten Klick, der von einer Suchergebnisseite / Google News kam, die gleiche Seite angezigt werden musste wie sie der Googlebot gesehen hat, erst Links die von dieser Seite wegführten durften dann auf Seiten führen die hinter der Paywall lagen.

Zum 1. Dezember hat Google, als eines der Zugeständnisse die sie an die Content-Provider gemacht haben, diese Regelung geändert. Seit diesem Datum gibt es die eingeschränkte First-Click-Free-Regelung die es des Content-Providern erlaubt, nach einer bestimmten Anzahl von Klicks am Tag, die von Google-Seiten kommen auch die Seite die auf diesen First-Click hin angezeigt wird hinter die Paywall zu legen. Als Minimum müssen allerdings 5 Clicks pro Tag frei sein.

Die Implementierung dieser veränderten Regelung obliegt den Verlagen (ist auch das einzig technisch sinnvolle).  Jeder der sich technisch halbwegs auskennt, weiss, das das ganze nicht trivial ist und insbesondere auch mehr Last auf den Systemen erzeugt. Daher ist es nicht verwunderlich, dass das Abendblatt die veränderte Regelung noch nicht umgesetzt hat.

Letztendlich ist aber auch eine  ökonomische Frage, ob sich der zusätzliche Aufwand überhaupt lohnt. In meiner Abwägung würde aber der Glaubwürdigkeitsaspekt überwiegen und ich gehe davon aus dass Axel-Spinger dies auch tun wird. Da Abendblatt und Berliner Morgenpost jetzt auf der gleichen technischen Plattform laufen müssen sie es ja auch nur einmal machen.

Ich gehe davon aus dass diese Lücke noch im Laufe dieses Jahres, allerspätestens im Januar geschlossen wird.

Das Googlebotloch

Eine zweite Lücke auf den in den Tweets hingewiesen wird ist das “Googlebot”-Loch. Hier gibt sich der Browser als Google-Crawler aus. Da dieser ja die Inhalte komplett sehen soll (siehe oben) wird der komplette Inhalt ausgeliefert.

Hier verwundert es mich allerdings, dass das Abendblatt diese Lücke noch nicht geschlossen hat. Das Verfahren dazu bescheibgt Google selbst auf seinen Webmaster Seiten. Es bsetht wus einem sog. Reverse DNS Lookup der feststellt ob eine IP-Addresse (die bei jedem Request mitgeliefert wird) auch aus der googlebot.com Domäne kommt, optional gefolgt von einer normalen (Forward) DNS Anfrage die dann verifiziert, ob der im ersten Schritt zurückgelieferte Name auch auf die angegebene IP-Adresse aufgelöst wird.

Dies ist notwendig, da Google nicht die IP-Adressenbereiche der Maschinen die den Crawl  ausführen bekannt gibt. Wäre das der Fall, so wäre eine Filterung der Googlebot Requests auf  diese Adressen trivial.  So ist das ganze aber mit nicht unerheblichem Aufwänden und Kosten verbunden. Darüberhinaus entstehendurch die beiden DNS-Requests Verzögerungen bei der Auslieferung der Seiten.

Im übrigen hat Google ebenfalls vor kurzem (als weiteres Zugeständnis an die Content Provider einen dedizierten Crawler-Namen für den Google News Crawler eingeführt. Dieser heisst: Googlebot-News.

Written by gkamp

December 16th, 2009 at 11:40 am

Going places – Status and example

one comment

D-Ticker Scrennshot

Ed. note.: Another autumn-cleaning action. This time the post has been sitting here in a draft state since at least 15 months. Time to get it out of the door.

It’s part of a  mini series called “Going places” about geocoding news at the source. Prior installments of this series can be found here. They for example explain what places of news, places within news and scopes are.

Current status of geocoded news  at dpa-infocom

Since approx.  15months we are  geocoding both places of news and places within news in our regional online wires. Right now we are geocoding:

  • scopes as places of news and
  • (generalized) addresses as places within news.

Representing geocodes within NITF

Being a news agency, IPTF formats are more or less the de facto standard of delivering news to our customers :-( Being the unline and mobile subsidiary we are delivering our wires as NITF. Hence we had to find a way to fit this information into this format.

In order to minimize the hassle for us as well as our customers we had to stay within the bounds of the NITF format as much as possible.

Since in the news industry the main delivery model is still push delivery (mostly via FTP :-( ) , there is also a need to include as much information about the scopes of news as possibe. Offering only a pointer (e.g. to a Restful API) that allows to access additional information would only be used by our most advanced customers :-( .

Hence we chose to use the language constructs for describing locations already provided in that format as much as possible and only resort to other means when there was no means for describing this information at all.

The NITF format (see NITF Documentation) provides at least two ways of representing location information:

  • evloc : Event location. Where an event took place (as opposed to where the story was written).
  • location : Significant place mentioned in an article. Used to normalize locations.

The first question to ask is why there are two different tags for geocoding locations (i suspect the standardisation process being responsible for that). Looking at the DTD definitions for both evloc and location, one can notice that they both try to describe the same information, but the location tag is actually the better and more detailed way of doing so.

Since we already used the evloc tag for denoting the country where the event primarily took place (i.e. some inverse locus like information ) we had every reason to only use the location tag for the locations of the news as well as the location in the news.

One might also note by looking at the DTD that apparently the NITF standardisation body didn’t consider the scope of the news to be a location and didn’t include any means to include any actual geographic data (e.g. points, lines, polygons, …).

But luckily an arbitrary number of locations can be included via the location tag, unfortunately only allowed in the head section of the document. The DTD of NITF then allows an arbitrary number of country, state, region, city and sublocation tags.

In order to be able to unambiguously represent the hierarchy we restricted this to a single occurence of the tags country, state, region and city as well as up to two sublocation tags.

Before going into detail, below is an example of a news story that has both: scopes and addresses. I guess it is the best way to explain our approach and to describe some of the problems we had / have to navigate. Location relevant part highlighted).

Example

<?xml version="1.0" encoding="UTF-8"?>
<!-- DOCTYPE nitf PUBLIC "-//IPTC-NAA//DTD NITF-XML 3.0//EN" "nitf.dtd" -->
<nitf xmlns:georss="http://www.georss.org/georss">
<head>
<title>Bayern München II schlägt Karlsruhe 3:1</title>
...
<identified-content>
<location class="scope">
<region region-code="09184000" code-source="AGS">München
	<georss:point>11.5725580365 48.1379548096</georss:point>
</region>
<state state-code="09000000" code-source="AGS">Bayern
	<georss:point>11.5725580365 48.1379548096</georss:point>
</state>
<country iso-cc="DEU">Deutschland</country>
</location>
<location class="scope">
<city city-code="09162000" code-source="AGS">München
	<georss:point>11.5725580365 48.1379548096</georss:point>
</city>
<state state-code="09000000" code-source="AGS">Bayern
	<georss:point>11.5725580365 48.1379548096</georss:point>
</state>
<country iso-cc="DEU">Deutschland</country>
</location>
<location class="scope">
<city city-code="08212000" code-source="AGS">Karlsruhe
	<georss:point>8.40437796821 49.0092142029</georss:point>
</city>
<state state-code="08000000" code-source="AGS">Baden-Württemberg
	<georss:point>9.17871582656 48.7750805322</georss:point>
</state>
<country iso-cc="DEU">Deutschland</country>
</location>
<location class="address">
Grünwalder Stadion, Grünwalder Straße, München, Germany
	<georss:point>11.566936 48.101078</georss:point>
<city>München</city>
<region>München</region>
<state>Bayern</state>
<country iso-cc="DEU">Deutschland</country>
</location>

</identified-content>
</docdata>
</head>
<body>
...
</body>
</nitf>

I’ve chosen this story because it is about a soccer game, an example scenario i used in my last post. So we encoded three scopes and one address.

Since it is the a third league game, the editors chose to select only administrative regions covering the cities on a  county level. One city (Munich) is actually divided into two counties, hence the sum of three counties.

If it would have been a premier league game, most likely there only would have been a single scope, the whole of germany whereas a second leugue game would presumably be encoded with some states.

The addresss represents the address of the stadium where the soccer game took place.

So let’s have a closer look at the example ’s representation.

Representing scopes

<location>
<region region-code="09184000" code-source="AGS">München
	<georss:point>11.5725580365 48.1379548096</georss:point>
</region>
<state state-code="09000000" code-source="AGS">Bayern
	<georss:point>11.5725580365 48.1379548096</georss:point>
</state>
<country iso-cc="DEU">Deutschland</country>
</location>

Remarks:

  • NITF already provides attributes called xxxx-code and code-source for all possible subtags of location, and since we are primarily using the official german coding scheme for administrative regions called “Amtlicher Gemeinde Schlüssel” short: it is natural to encode it  the way we do .
  • The coding scheme of AGS is actually a hierarchically coding scheme (two digits: state, 1-digit: sub-state level (“Regierungsbezirk”), 3-digits: county, 3-digits: city/town) , hence we could do away with the state tag but we decided to be as explicit as we could be.
  • Since we were using the three-letter variant of ISO3166 for the evloc tag we decided to use this variant also for the iso-cc attribute of the country tag.
  • Since we introduced scopes first and some customers wanted to include markers-on their maps they asked for some coordinates. Hence we chose to include the “official” coordinates of the admin region, denoted in some other GIS dataset we bought and chose to make use of simple georss:point tag for doing so
    • We were not allowed to distribute the geometries of the admin regions as part of our licensing deal (Yes you have to buy this data in germany) and the geometries would have used far too much bandwidth for sending them within the wire.
    • In hindsight i would like to remove the coordinates from scope items since the are  a) highly redundant, b) not always available and c) a constant source of discussion what an appropriate representative coordinate for a geographic extent might be
  • We currently use other coding-schemes on a sub-city level for some cities (Also official coding schemes by the city goverment). But since these are hard to come by on a national level, we are currently considering alternatives
  • We are also considering to extend the geocoding to our non-regional, i.e. national and international wires. Her we are looking into using the ISO3166-2 coding scheme and the NUTS3 coding scheme for the European Union

Representing Addresses

<location>
Grünwalder Stadion, Grünwalder Straße, München, Germany
	<georss:point>11.566936 48.101078</georss:point>
<city>München</city>
<region>München</region>
<state>Bayern</state>
<country iso-cc="DEU">Deutschland</country>
Remarks:
  • Addresses are provided by the editor.
  • The level of detail (exact address, strret level, district or city) presented is an editorial decision based on data protection regulations.
  • The address is then geocoded by Google Maps Premiere and the resulting coordinates and the  address returned by the geocoder are shown to the editor and validated by him
  • The this information, togehter with a label of the address is encoded into an NITF location tag in the form: label, address
  • The returned coordinates are also encoded into a georss:point tag.
  • region, state and country are taken from the respective fields of the structured response of the Google geocoder. Hence they might differ in writing from the respective official names. But we chose not to do point in polygon queries in order to harmonize becuase this would have resulted in running a spatially enabled database e.g. Postgres/PostGis.

Customer Uses

I just wanted to give some quick examples how our customers uses the geocodes in the wire.

First an iPhone App that uses the address coordinates to put the news on the map. A typical news map:

D-Ticker Screenshot D-Ticker Scrennshot dticker5

At the other end of the range is the way germany’s biggest tabloid Bild is using the scope information for automatically sorting the news into their different regiona portals. The following screen shots show how content from  is sorted into three diffent regional portals within the state of Northrhine-Westphalia. News that have a scope of the whole state show up in all three portals, whereas news only having a scope of one or more counties are sorted into the regionl portals that contain these counties (better: the AGS codes of thes counties).

Bild Regional Ruhrgebiet Bild Regional Köln Bild Regional Düsseldorf

Next steps

I’m planning to catch up with other aspects of geocoding at dpa in the next days so that i’m finally able to start writing about new ideas :-)

Written by gkamp

October 21st, 2009 at 5:39 pm

Posted in Noteworthy

Tagged with , , ,

DocumentCloud and OpenCalais: Some Questions

2 comments

Recently another KnightNews challenge winner announced the availability of the open source version of the code. This time DocumentCloud, a joined effort of NYT and ProPublica (not sure why these bigshots need grant money to do these things, but this is another story). It is opensourcing CrowdCloud which claims:  Parallel processing  for the rest of us.

Yesterday it announced another two-dozenhigh profile content partners (Nieman Labs view on this) as well as a partnership with ThomsonReuters OpenCalais (DocumentCloud Blog Post):

This morning we’re excited to announce a partnership with Thomson Reuters, which is contributing its OpenCalais service to DocumentCloud. OpenCalais uses natural language processing to extract information from documents, instantly identifying and tagging the relevant people, places, companies, facts and events. This will make it easy for readers and journalists to explore connections between documents and across the full collection of source materials.

I’m very excited to use DocumentCloud / CrowdCloud but i have a couple of questions regarding the OpenCalais Terms of Service. Since i’m not sure when they’ll make it through moderation, i’m reposting them here:

Can you (and Thomson Reuters) please clarify if you are using the public free version of OpenCalais. If so it would be very helpful to get your reading of the terms of service. Until now the terms were the reason that i’m very hesistant to use OpenCalais for tasks at my news org.

Since i would very much love to use OpenCalais and DocumentCloud it would be very helpful for me to get more information on your interpretation to the following parts of the terms:

1. As far as I understand the Terms of the service of the public version not only allow Reuters to keep and use the metadata (with some rumour that the full text is part of the metadata).


You understand that Thomson Reuters will retain a copy of the metadata submitted by you or that generated by the Calais service. By submitting or generating metadata through the Calais service, you grant Thomson Reuters a non-exclusive perpetual, sublicensable, royalty-free license to that metadata.”

Since Reuters somehow has to refinance the operation of OpenCalais i’m basically fine with that clause but it would be interesting to know about the types of services and products they are sublicensing the metadata to.

2. IMHO the terms make it at least difficult to use other Metadata extraction means e.g Homegrown NLTK / GATE jobs, Metacarta API, inxight, empolis, … etc. or offering this metadata as part of your own api e.g. the NYT API.

“# If you syndicate, publish or otherwise transmit any content containing, enhanced by or derived from Calais-generated metadata you will use your best efforts to incorporate the correct Calais-provided Globally Unique Identifier (GUID) in that content. You specifically agree not to attach incorrect GUIDs to your content with any intent to mislead, spam, spoof, phish or otherwise deceive downstream consumers of your content.

# You will not use any metadata or GUIDs produced by Calais to create a metadata retrieval service similar to Calais. To ensure the quality of metadata for all Calais users we want to maintain a single verifiable metadata storage location.”

I read these clauses such that e.g.the NYT Times API (as among other things a metadata retrieval service for people, persons and places) is not allowed to use the public OpenCalais service as part of its processing. Is my interpretation too strict? I’m basically talking about open Calais as a preprocessing step where the results would be curated by human beings

BTW: The last sentence of that quote looks very strange to me give the “Linked Open Data” initiative, including Freebase, DBPedia etc which all provide their ow GUIDs.

3. One clause of the terms for me looks like DocumentCloud is in direct violation of it:

“You will not do bulk processing where you are adding minimal value beyond adding Calais metadata to the content. For example – if you are a webcrawler you should not send everything to Calais before sending it to your users.”

Since DocumentCloud is all about bulk processing: Was this claused waived for DocumentCloud (including all uses of DocumentCloud in outside the original partners installation? E.g. Systems  derived from DocumentCloud / github clones, … Or does it mean that i cannot do only Metadata annotation on a DocumentCloud job but have to do some other things in the same job too?

I hope that most if not all of these questions have already been asked and answered by the various content partners and it’s easy for you to answer them.

Written by gkamp

September 25th, 2009 at 6:52 am

(Centralized) URL shorteners considered harmful

one comment

In late April i started writing this post and it was lying around unfinished all these months. The impetus for the post were  a number of discussions i had whether URL shorteners are a good thing or not. (My position was along the lines of this BoingBoing post by Cory Doctorow)

Given the (recent) developments with Digg, cli.gs and tr.im i finally decided to finish the post(s): Here is the short version (Update: I started to elaborate on the points below in a follow up post ):

  1. Centralized URL Shorteners are considered harmful
  2. Serious content providers simply cannot rely on them due to quality of service reasons
  3. The need for shorturls within Twitter is arbitrarily imposed by Twitter, and can and should easily be changed by Twitter
  4. There are cases where content provider have a real need for short URLs. But short in that cases means around 30 chars in the worst case (newspaper columns) not 15.
  5. Most of the content providers already can provide this kind of short urls using a second-level domain of 10 – 15 chars and the ID of the content within their CMS (or a Base62 encoded random ID)
  6. rev=”canonical” and/or rel=”alternate shorturl” are ways to announce that short URL to others that might be interested in your content. Use it and ask for support of this in your CMS (or build it)
  7. Realtime statistics are another (if not the) reason why centralized URL shorteners are used by many people. This kind of statistics should already be available in your CMS. If not you’re not really a serious content provider.
  8. If you really need real-time statistics for other peoples content, either use a redirecting URL on your own domain (as has been done for 10+years) or
  9. Run your own little URL shortener that is fully under your control , only writable to your staff and  integrated / integratable into your CMS.

Thinking about how to solve the real-time statistics dilemma the real-time the following bold / crazy idea was born:

What about the content providers opening  up  their access statistics themselves instead of having bit.ly etc. doing it for them.

This would make sure that all people can see the complete picture and not only some potentially twisted fragment provided by an URL Shortener service. Most content providers  already have a most emailed, best rated etc. section on their pages. It would also provide a much needed transparency in a link  economy as well as provide a decentralized solution for an apparent need of the people using the web. They want to know about how they contribute to the success of

Opening up the statistics like bit.ly and tr.im are doing it is IMHO the right way:

  • Everybody sees the summarized results for the page (The decision of the level of detail provided is up to the content provider)
  • Authenticated users can see the traffic that came from their registered  domains
  • Access via the web site and an API

Some first implementation ideas:

  • Authentication via OpenID /OAuth
  • Domain ownership verification like Google is doing for Google Apps etc.
  • Encapsulatin of the whole thing into plugins for blogs software and CMS

What do you think?

Written by gkamp

August 12th, 2009 at 9:04 am

Posted in Noteworthy

Tagged with