What is the Subject Origin?

🔗 2013-03-12

What is the Subject Origin?

RDFa allow annotations of subjects (identifiers) to exist in multiple locations within a document. When a user tries to retrieve elements by this subject identifier, what element is returned? Currently, the RDFa API says that all the element origins in the document identified via @about , @resource, @src , @href are returned by the document.getElementsBySubject() API method.

For example, consider this example using RDFa:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title></title>
    </head>
    <body vocab="http://www.example.com/">
       <div about="_:ex1">
          <span property="a">v1</span>
       </div>
       <div about="_:ex1">
          <span property="b">v2</span>
       </div>
       <div resource="_:ex1">
          <span property="c">v3</span>
       </div>
       <div about="_:ex1" typeof="T">
          <span property="d">v4</span>
       </div>
       <div resource="_:ex1" typeof="T">
          <span property="e">v5</span>
       </div>
    </body>
</html>

This example generates the triples:


<origin.xhtml>	<http://www.w3.org/ns/rdfa#usesVocabulary>	<http://www.example.com/>
@prefix e: <http://www.example.com/> .
<_:ex1> rdf:type <http://www.example.com/T> ;
        e:a	"v1" ;
        e:b	"v2" ;
        e:c	"v3" ;
        e:d	"v4" ;
        e:e	"v5" .

What is the subject origin?

Five div elements use the subject _:ex1 .
Each child span element generates a different property.
Two separate elements type the subject as http://www.example.com/T .

With the current RDFa API , all of these div elements should be returned by document.getElementsBySubject("_:ex1) and two by document.getElementsByType("http://www.example.com/T") . Also, each property is generated by a different set of descendants of each subject element origin.

Obviously, this particular example is quite pathological. That said, the ability to have subject annotations in different places within a single document is a good feature that is useful when the content doesn’t follow a tree structure. As such, in practice, something like this will happen for good reasons.

In contrast, it probably isn’t a good idea to type a subject in different locations. The resulting annotation graph (i.e. triples) are the same but it just isn’t necessary to do as the same thing is being said twice. Alas, it is possible and is likely to happen in some document somewhere on the Web.

What does this mean for an easier API?

There are some hard bits here in that getting subject and typed element origins always needs to return an array of elements. This means that for simple annotations where there is only one subject/typed element origin, the API has a cardinality mismatch. I’m not sure what the right answer is but always having to de-reference an array is unfortunate.

Further, if we want to have an RDFa API object accessible on the element origin, as Green Turtle does in a limited way, then this object/element pair needs to have three properties:

The element must be the origin of a single subject.
The same RDFa API object must be accessible from all subject origins.
The same subject properties (i.e. subset of the annotation graph) must be available.

I believe that (1) is satisfied by the way subjects are generated from the RDFa attributes in section 7.5 of the RDFa 1.1 Core specification. As such, the same object can be presented via the API to the consumer for that particular subject. This also helps satisfy (2).

The third part is essentially acknowledging that this object is a “jumping off point” where a consumer is likely to access any number of properties of the subject. As such, you should get all the properties regardless of whether they are actually specified on that particular element or its descendants. That is, from a usability perspective, it doesn’t make sense to restrict it to those derived from the descendants.

Further refining this, generating the subset of properties only exhibited by that element is more computationally expensive. The regular element DOM will tell you the authored properties if you just look at the descendants’ use of the RDFa attributes. As such, authoring tools can determine this in better ways.

What next?

My proposal is that every subject origin have a data property that is the RDFa API object. The id property of this object returns the subject URI. Further methods on this object should allow access to the subject properties (i.e. subset of the annotation graph).

The next problem is how to make accessing properties and values easier for scripting.