r. alexander miłowski, geek

Alex Miłowski

GeoJSON to the Rescue (or not)!

This is the fourth entry in my series on my PhD dissertation titled Enabling Scientific Data on the Web. In this entry, we will explore GeoJSON as an alternate approach to geospatial scientific data.

What is GeoJSON?

GeoJSON is a format developed for “encoding a variety of geographic data structures” . It is feature-oriented, just like KML, and can replace the use of KML in many, but not all, Web applications. The encoding conforms to “standard” JSON syntax, with an expected structure and set of property names.

A GeoJSON Object Containing Two San Francisco Landmarks
{ "type": "FeatureCollection",
  "features": [
      {"type": "Feature",
       "properties": {
          "name": "AT&T Park",
          "amenity": "Baseball Stadium",
          "description": "This is where the SF Giants play!"
       },
       "geometry": {
          "type": "Point",
          "coordinates": [-122.389283, 37.778788 ]
       }
      },
      {"type": "Feature",
       "properties": {
          "name": "Coit Tower"
       },
       "geometry": {
          "type": "Point",
          "coordinates": [ -122.405896, 37.802266 ]
       }
    }
  ]
}

A GeoJSON object starts with a feature collection and each feature is a tuple of a geometric object, an optional identifier, and a property object. The geometry object describes a point, line, polygon, arrays of such objects, or collections of mixed geometry objects.

The “properties ” property of the feature is any JSON object value. In the example shown above, it defines a set of metadata for each point that describes a location in San Francisco. If the property names match the expectations of the consuming application, it may affect the rendering (e.g. a map marker might be labeled with the feature name). There is no standardization of what the “properties ” property may contain other than it must be a legal JSON object value.

GeoJSON at the USGS

The US Geological Survey (USGS) provides many different feeds of various earthquakes around the world as GeoJSON feeds . Each feature is a single point (the epicenter) and an extensive set of properties is provided that describe the earthquake. The property definitions is defined on the USGS website but their use is not standardized.

An Earthquake Feed Example
{"type":"FeatureCollection",
 "metadata":{
     "generated":1401748792000,
     "url":"http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/significant_day.geojson",
     "title":"USGS Significant Earthquakes, Past Day",
     "status":200,
     "api":"1.0.13",
     "count":1
  },
  "features":[
     {"type":"Feature",
      "properties":{
         "mag":4.16,
         "place":"7km NW of Westwood, California",
         "time":1401676603930,
         "updated":1401748647446,
         "tz":-420,
         "url":"http://earthquake.usgs.gov/earthquakes/eventpage/ci15507801",
         "detail":"http://earthquake.usgs.gov/earthquakes/feed/v1.0/detail/ci15507801.geojson",
         "felt":3290,
         "cdi":5.4,
         "mmi":5.36,
         "alert":"green",
         "status":"reviewed",
         "tsunami":null,
         "sig":806,
         "net":"ci",
         "code":"15507801",
         "ids":",ci15507801,",
         "sources":",ci,",
         "types":",cap,dyfi,focal-mechanism,general-link,geoserve,losspager,moment-tensor,nearby-cities,origin,phase-data,scitech-link,shakemap,",
         "nst":100,
         "dmin":0.0317,
         "rms":0.22,
         "gap":43,
         "magType":"mw",
         "type":"earthquake",
         "title":"M 4.2 - 7km NW of Westwood, California"
      },
      "geometry":{
         "type":"Point",
         "coordinates":[-118.4911667,34.0958333,4.36]
      },
      "id":"ci15507801"
    }
  ]
}

It is quite easy to see that when this data is encountered outside of the context of the USGS, the property names have little meaning and no syntax that identifies them as belonging the USGS.

Out with the Old, in with the New

Just replacing KML's XML syntax and legacy structures from Keyhole with a JSON syntax doesn't address much other than making it easier for JavaScript developers to access the data. There are plenty of mapping tool kits, written in JavaScript, that can readily “do things ” with GeoJSON data with minimal effort and that is generally a good thing. Many can also consume KML as well and so we haven't necessarily improved access.

The format is still oriented towards map features. If you look at the example above, you'll see that the non-geometry information overwhelms the feature information. If you want to process just the properties, you need to enumerate all the features and then extract (access) the data. Because JSON results in a data structure, GeoJSON makes this a bit easier than KML and is an obvious win for this format.

Remember that we are still looking at scientific data sets and scientists love to make tables of data. The USGS earthquake feed is a table of data that happens to have two columns of geospatial information (the epicenter) and 26 other columns of data. Yet, we are forced to a map-feature view of this data set by the choice of GeoJSON.

Keep in mind that the OGC says this about KML:

We could say almost the same thing about GeoJSON except that it doesn't say what to do with the properties. There is only an implied aspect of GeoJSON that the features are rendered into map features and then the properties are displayed somehow. That somehow is left up to the Website developer to code in JavaScript.

Does JSON-LD Help?

GeoJSON is fine for what it does and doesn't do, but it probably shouldn't be used to exchange scientific data. It lacks any ability to standardized what to expect for as data for each feature and such standardization isn't the purview of the good folks that developed it. We might be able to place something in the value of the “properties ” property to facilitate syntactic recognition of specific kinds of data.

One new thing that I am considering exploring is a mixed model where the “properties ” object value is assumed to be a JSON-LD object. This allows the data to have a much more rich annotation and opens the door to standardization. Unfortunately, this is still on my “TODO list” .

What is next?

I'm just about done with formats for scientific data. There are many, many more formats out there and they suffer from many of the same pitfalls. Up next, I want to address what it means to be on the Web, address some architecture principles, and describe some qualities we want for Web-oriented scientific data.