r. alexander milowski, geek

My mood...Welcome to the web home of Alex Milowski. Here you'll find information about me, some of the software I've written, and the projects in which I participate. You'll find variety of Mathematics, technology, papers, and presentations on this website written or contributed to by me. If you have an questions or comments, please don't hesitate to contact me.

Recent Entries

Microdata in Green Turtle

After being asked how hard it would be to add Microdata, at least in some minimal way, to Green Turtle I decided to find out.  Since I already have a bunch of infrastructure, this didn't feel like a hard thing to add.  If I did add support, at least experimentally, microdata and RDFa could use the same API and scripting support.

Of course, this requires understanding how to interpret microdata and generate a triples graph from it.  There are some pathological cases, such as untyped items, that are just impossible.  For example:

<div itemscope>
<p itemprop="name">Alex Milowski</p>
</div>

Is it someone's attempt at encoding a person?  Does it have any interchange semantics over the Web?  Probably not and so I just ignore things that don't map easily.

Meanwhile, properly encoded schema.org items should map quite well. In the context of a schema.org item type, the property names map in a uniform way and the triples graph is easy to generate. Other well structured namespaces will map as well (e.g. those with hash names or URI structured the same as schema.org).

Let's get down to the details:

  1. You can do this right now with Green Turtle 1.0.
  2. I've implemented an experimental processor.
  3. You can use that right now too!

Here's how:

  1. Until I package this implementation somehow, you'll need to include three Javascript files: URI.js, RDFaGraph.js, and Microdata.js
  2. Register a processor with Green Turtle for Microdata:
    <script type="text/javascript">
    GreenTurtle.implementation.processors["microdata"] = {
       process: function(node,options) {
          var owner = node.nodeType==Node.DOCUMENT_NODE ? node : node.ownerDocument;
          if (!owner.data) {
             return false;
          }
          var processor = new GraphMicrodataProcessor(owner.data.graph);
          processor.process(node);
          return true;
       }
    };
    </script>
    

That's it!  The Green Turtle runs after the document has loaded and so any registered processors will run against the document.  In this example, the Microdata triples are merged into the same graph as the RDFa.  Beware, there be dragons here!

An example of the above is available here.

Now, the question remains as to what I should do with this.  I'm not a huge fan of Microdata but embracing users of Microdata and giving them a way to co-exist or transition to RDFa feels like a good idea.

The questions that need to be answered are:

  1. Do I integrate this into Green Turtle as a standard feature?
  2. Should it be enabled by default?
  3. If I don't integrated it, how should it be packaged for ease of use?

It is important to understand that there is code being duplicated. Both URI.js and RDFaGraph.js are already packaged with Green Turtle but they are hidden from general use as to not clutter the global namespace. As such, you unfortunately need to include them again.  As such, something needs to be done to make this easy to distribute.

Scientific Measurements in Schema.org

The physical sciences use the idea of a quantity to measure the world around us.  While that might seem simple, basing a measurement on a system of units that can be quantified, measured accurately, verified isn't exactly simple.  Nevertheless, a great deal of time and effort has converged into the International System of Units (SI).

One of the good things about this system is that you can compose a base unit and a prefix to deal with scale.  That is, you can combine a base unit with a prefix to get a different scale of measurement (e.g. m (meter) versus km (1000 meters), cm (1/100 of a meter), etc.).  This allows a whole scheme of measurements where the scale is known along with a base unit without constructing separate names or symbols.

In an attempt to align my weather data efforts with schema.org's types, I looked into what was available currently in either the current types or proposals for representing quantities.  The most obvious candidate is QuantitativeValue that has a value and unitCode property.  The unfortunate part of this is that unitCode is a UN/CEFACT code and, if you look at the codes in their XML Schemas, you'll see all kinds of odd units and codes that aren't a good match for SI units nor their commonly used symbols.  Yet, this type came from valid uses in the Good Relations vocabulary and, in that context, the use of UN/CEFACT makes sense.

To further complicate this issue, there are generic properties in the schema.org types of height, width, depth, and weight and one would expect, going forward, to use these in non-UN/CEFACT contexts.  At this point, there are two choices: make a new class for quantities or modify QuantitativeValue.  If you ignore the odd name of QuantitativeValue over Quantity, I believe the right choices is to modify and add properties that can capture SI units and scientific quantities it a better way.

This is where the QUDT (Quantities, Units, Dimensions, and Data Types in OWL and XML) comes into play.  While I highly recommend reading this specification, the results come down to some basic classes and properties that are very useful:

  1. qudt:QuantityValue - the actual class for specifying a value that the numeric value, symbol, and unit defined.
  2. qudt:symbol - the unit symbol property (a string).
  3. qudt:unit - the unit defined as a qudt:Unit (essentially a URI).

QUDT has the interesting property that it defines a vocabulary of SI units as well as non-SI units that can be mapped.  Each instance of qudt:Unit has a name (a URI) and defines the necessary property values needed to map from that unit to some base SI unit.  For example, the Celsius temperature unit is defined against the Kelvin temperature SI base unit.

The result is simply that schema.org can use the names (the URIs) of the units defined by QUDT without necessarily incorporating the whole QUDT hierarchy.  If you want the full graph, you can incorporate the QUDT vocabulary and things should work out well.

My proposal to modify QuantitativeValue is simple:

  1. Add a unitSymbol property as a string to capture the common string label of the unit.  This is distinct from unitCode that will remain a UN/CEFACT code.
  2. Add a unit property that is a URI that names the unit.  This allows the QUDT vocabulary to be used directly.

Now the problem is getting this through whatever schema.org process exists.  As I understand the process, the intent is to take what people are using (what works) and try to incorporate that wholesale. My proposal isn't quite that because it is a small modification to an existing class taken from Good Relations.  That said, it is important to get quantities right so the class can be reused everywhere there is a need for measurement.

Green Turtle 1.0 Released!

I've just released 1.0 of Green Turtle.  My implementation now passes all the tests in the W3C conformance test suite for both RDFa 1.1 and Turtle.

There is also a great new feature that integrates Turtle handling directly into the API and allows for automatic processing of embedded Turtle.  This allows data providers to use RDFa or embedded Turtle directly in their documents and Green Turtle will do the right thing!

Also, this release supports RDFa in HTML as specified in the latest draft and passes all the relevant tests for that processing mode.

Finally, the API has been adjusted to allow direct access to representation parsers (e.g. Turtle).  This feature allows a developer to retrieve or construct triples via a representation like Turtle and then merge them into the current document's data.  As such, all the nice features of the RDFa API are available to the developer for Turtle as well.

[More entries ...]