Going Places – Scopes and other locations

Rem: First i have to apologize that i didn’t write this post earlier. But i was busy getting phase 2: geocoding of places within the story out of the door an then i had some holiday, then went to Where2.0 and Wherecamp etc.

But expect a number of articles tagged goingplaces this week.

Earlier Parts of this miniseries: Part I: Adding geographic metadata to news at the source, Part II: Great news from Adrian Holovaty, Part III: Early Experiments, Part IV: Places in News stories vs. Places of News storie

Before showing some examples how our the geocoded stories look like i’ll first introduce the current definitions of the various terms we use and the reasons why we use that definitions. This post particulary builds o top of my last post and you might read it first.

Warning: This may be boring stuff for some of you and you might ask yourself why is this guy trying so hard to find kinda formal definitions. Actually doing this it is fascinating stuff for me (couldn’t negligect help my former AI, ontology building life).

In addition i think it is absolutely essential to have an idea of how and why you are adding geocodes to your content before you are going to do so. Especially when you are a professionnal news organization and reseller. So here we go:

Geocoded News Stories

The first thing to do is to somehow define what a geocoded news story is about. Please note that all following definitions do not attempt to fulfill all the criteria a mathematician has in mind when he hears the term definition but rather to create a comon understanding of what is meant, e.g. more like a glossary entry / dictionary entry.

A geocoded news story: is a news stories that has at least one location attached as accompanying meta-data. Locations include locations of the news story as well as locations in the news story.; In case the story is a complex news story composed of a number of parts (e.g. multiple texts and / or multiple images, a multimedia news story consisting of text, image, video, audi, etc). the locations of the story as a whole is the (multi-)set of the parts of the story.

Please note that a news story can have multiple scopes and that not all scopes have to be in the same hierarchy. It is only required that every hierarchy in itself is a hierachical partition of a clearly defined geographical extent.

It is absolutely perfect to have one scope being contained in a hierarchy e.g. denoting the administrative divisions of germany as defined in the so called “Amtlicher Gemeinde Schlüssel (AGS)” (Bundesländer ~ states, Regierungsbezirk (no equivalent in the US), Kreis ~ county, Stadt/Gemeinde ~city/town/village) and another scope belonign to a second hierarchy denoting the adminstrative subdivisions of a certain city, e.g the boroughs and districts of Hamburg, the neigbourhoods of hamburg as defined by some company or community, the zipcodes of germany etc.

I also think that being able to add metadata describing a geographic extent to which content is deemed relevant would benefit all kinds of content, ranging from tweets, (e.g. please notify only my friends in the City of San Franciso that i’m coming to town, because i’m only there for 2 hours, and other friends in California wouldn’t make it in time) to blog posts since they are basically news stories to wikipedia entries.

Another way of looking at scopes is as hints of what to expose at what zoom level on a map. For doing so you don’t need complex calculations. Adding some information/ access to the bounding box of the scope in order to be able to do so,

Hierarchical partitions

Since my early experiences i don’t believe in grand unified theories / ontologies, that try to the model of a domain. I rather believe in sets of small, very domain specific ontologies. The notion of a hierarchical partition for a certain extent does originate from this belief. It encapsulates the partonomy relationships for localities of a coherent set of types.

From an engineering point of view the notion of a hierarchical partition also allows us to loosely couple the different hierarchies.

So what is a hierarchical partition of a defined geographic extent? And why do we care? To answer the second question first:

We have somehow to explain what we are doing to our customers (and ourselves)
If we happen to come up with a definition that has a nice set of properties we might be able to use algorithms that take advantage of these properties. The following shows that our understanding / definition of what a hierarchical partition is evolved over time.

I first interpreted hierachical partition in the pure mathematical sense, i.e.:

for any given point in the plane within the defined geographic extent there is exactly one corresponding scope on each level of the hierarchy
for any given scope there is exactly one predecessor wrt. this hierarchy.

But looking at the administrative division of germany i recognized that this is actually not the case. and the first criterion has to be relaxed. This stems from the following facts.

There are counties denoting cities (so called Stadtkreise) that are not represented in the “city/town/” level of the AGS hierarchy. e-g there “holes” at this level. While this fact may be worked around by adding these counties into the city level of the hierarchy.
Some states do not have so called “Regierungsbezirke”, they eliminated this level at some time in the past. Hence there are also holes at this layer.

The following changes of the rules would take care of these facts:

for any given point in the plane within the defined geographic extent there is at least one corresponding scope in some level of the hierarchy
for any given point in the plane within the defined geographic extent there is at most one corresponding scope on every level of the hierarchy
for any given scope there is exactly one predecessor wrt. this hierarchy.

So i thought that this division was sufficient to cover also the adminsitrative subdivisision of other countries. But when validating this defintion against the administrative divisions of the United States i learned that New York City is an aggregate of 5 counties of the state of New York , each county being coterminous with a borough of New York City. Taking care of that and hopefully preparing ourselves of other “strange” cases we end up with the following definition of a hierarchical partition:

A hierarchical partition p of scopes of a geographic extent e

is a directed acyclic graph (DAG) with the following properties:

There is a single source s_top (the top level scope) with a geographic extent being coterminous with the geographic extent (using coterminous as having matching boundaries interpretation
every scope has a property denoting its level in the hierarchy with the top level scope having the level 1
for any given point p in e there is at least one corresponding scope s(point) at some level in the DAG
for every scope that has more than one successor the geographic extent of set of successors is coterminous with the geographic extent of this scope
for every scope that has more than one predecessor the geographic extent of set of predecessors is coterminous with the geographic extent of this scope

Rem:

This definition is definitely not perfect in it’s formulation but some of you might help me with improving it. It might also be better to start with a poset based definition and switch to a graph based definitin when introducing additional relations, e.g. topological relations describing adjointness etc.
It might be helpful to distinguish between hierarchical partitions and leveled hierarchical partitions with the difference between two two is the fact if the scopes are assigned levels, or not.
Why is it important to have the level information you might ask? It is necessary in order to transcribe the semantics
In order to not lose the two stricter definitions the first one is defined as a strict partition hierarchy, whereas the second is a partition hierarchy.
I haven’t found the time to look deeper into this, but it looks like that it is likely that a (leveled) hierarchical partition already has been assigned a name somewhere in mathematics. IIf someone out there happens to know where to look (computational geometry?) would love to know about it.
It also seems to be the case that if you add a an additional level with a single node s_bottom that is the successor of every leaf node in the hierachical partition, you get a lattice. Maybe some of the lattice properties and knowledge /algorithms for lattices might prove helpful

After this very extensive coverage of scopes i just briefly introduce the current definitions of loci and places of production. This is mostly the case because right now there are only some ideas how a definition of these should look like.

A locus of a story

is a geographic name contained in a set of geonames of a defined geographic extent,
representing the / a smallest area wrt. the above hierarchy where the events of this story are happening / have happened / are going to happen

Rem.:
Initially a locus was also defined a being part of a hierarchical partition. This gains the advantage to being able to unambigously describe the locus withtin that hierachy (at least at each level). But while this is a property that is important for scopes, in fact that is the main purpose of scopes, being able to use names that are typically used e.g natural features like mountain ranges etc. is more important than being unambigous.

A place of production of a story

is a location where the news story (or parts of it) were produced (e.g. written by the author, edited by the editor, …)
describing as exact as possible the geographic position of the production (e.g. using geographic coordinates, addresses, …)

Locations within news stories

A location (with)in a (news) story is a location directly or indirectly mentioned in the news story itself. These locations are typically not geographic names but rather addresses, street segments, blocks, or POIs. Not surprisingly, locations within news stories are typically more specific (in the sense of having a smaller geographic extent) than locations of news stories.

What’s next?

In the next post i’m sketching the current status and am finally giving you some examples.

June 2, 2008

admin

Eds. Notes, Noteworthy

goingplaces, locations, maps, news, scopes

Relations

Going Places – Scopes and other locations

Geocoded News Stories

Locations of news stories, Locations in news stories

Geographic Names

Scopes

Hierarchical partitions

Locations within news stories

What’s next?