A New Entrant into the Lion’s Den
Exactly one month ago I wrote in The Shaky Semantics of the Semantic Web, “The time is now and sorely needed to get the issues of representation, resources and reference cleaned up once and for all.”
The piece was prompted by growing rumblings on semantic Web mailing lists and elsewhere about semantic Web terminology, plus concerns that lack of clarity was opening the door for re-branding or appropriating the semantic Web ‘space.’ I observed these issues were “complex and vexing boils just ready to erupt through the surface.”
My own post was little noticed but the essential observations, I think, were correct. In the past month the rumblings have become a distinct growl and aspects of the debate are now coming into direct focus. I think the perspective has (thankfully) shifted from wanting to not re-open the so-called arcanely named “httpRange-14” debate (a less technical explanation is in Wikipedia on its role in bringing the concept of “information resource” to the Web) of three years past to perhaps finally lancing the boil.
A Determined Protagonist
Many of us monitor multiple mailing lists; they seem to have their own ebb and flow, most often quiet, but sometimes rat-a-tat-tat furious. In and of itself, it is fascinating to see which topics and threads catch fire while others remain fallow.
One mail list that I monitor is the W3C‘s Technical Architecture Group, in essence the key deliberation body for technical aspects of the Web. Key authors of the Web such as Tim Berners-Lee, Roy Fielding and many, many others of stature and knowledge either are on the TAG or participate in its deliberations. The TAG’s public mailing list is immensely helpful to learn about technical aspects of the Web and to get a bit of early warning regarding upcoming issues. The W3C and its TAG are exemplars of open community process and governance in the Internet era.
I assume many hundreds monitor the TAG list; most, like me, comment rarely or not at all. The matters can indeed be quite technical and there is much history and well-thought rationale behind the architecture of the Web.
Xiaoshu Wang has recently been a quite active participant. English is not Xiaoshu’s native language, but because of his passion he has nonetheless been a determined protagonist to probe the basis and rationale behind the use of resources, representations and descriptions on the Web. These are difficult concepts under the best of circumstances, made all the harder due to language differences and special technical senses that have been adopted by the TAG in its prior deliberations.
These concerns were first and most formally expressed in a technical report, URI Identity and Web Architecture Revisited, by Xiaoshu and colleagues in November 2007.
My layman’s explanation of Xiaoshu’s concerns is that the earlier httpRange-14 decision to establish a technical category of “information resources” begs and leaves open the question of the inverse — what has been called a “non-information resource” — and actually violates prior semantics and understandings of what should be better understood as representations.
A Respected Interlocutor
This discussion arose in relation to the Uniform Access to Descriptions [1], a thread begun by Jonathan Rees of the TAG to assemble use cases related to HttpRedirections-57, a proposal to standardize the description of URI things, such as documents, by rejuvenating the link header. Because of its topic, discussion of httpRange-14 was discouraged since putatively the core definition of “information resource” was not at issue.
However, after introduction of a most interesting pre-print, In Defense of Ambiguity [2], co-author Harry Halpin perhaps inadvertenly opened the door to the httpRange-14 question again. Then, Xiaoshu began submitting and commenting in earnest, and Stuart Williams of the TAG, in particular, was helpful and patient to help draw out and articulate the points.
My observation is that Xiaoshu was never advocating a change in the basic or current architecture of the Web, but perhaps that was not apparent or readily clear. Again, the frailty of human communications compounded by language and perspective have been much in evidence.
Pat Hayes, the editor of the excellent RDF Semantics W3C recommendation, then intervened as interlocuter for Xiaoshu’s basic positions. Many, many others, notably including Berners-Lee and Fielding, have also joined the fray. The entire thread [4] is worth reading and study.
Since Xiaoshu has publicly endorsed Hayes’ interpretation, here are some important snippets from Pat’s articulation [3]:
There simply is no other word [than ‘represents’] that will do. And the size, history and, I’m sorry, but scholarly and intellectual authority of the community which uses a wider sense of ‘represent’ so greatly exceeds the AWWW [W3C Web] community that I don’t think you can reasonably claim possession of such a basic and central term for such a very narrow, arcane and special (and, by the way, under-defined) sense.
If AWWW had used a technical word in a new technical way, then this would likely have been harmless. Mathematics re-used ‘field’ without getting confused with agriculture. But the AWWW/semantics clash over the meaning of ‘represent’ is harmful because the senses are not independent: the AWWW usage is a (very) special case of the original meaning, so it is inherently ambiguous every time it is used; and, still worse, we need the broader meaning in these very discussions, because the TAG has decreed that URIs can denote anything: so we are here discussing semantics in a broad sense whether we like it or not. And if the word ‘represent’ is to be co-opted to be used only in one very narrow sense, then we have no word left for the ordinary semantic sense. To adopt a usage like this is almost pathological in the way it is likely to generate confusion (as it already has, and continues to do so, in spades.)
The way we name Web pages is a special case of this picture, where the ‘storyteller’ is the same thing as the resource. Things that can be their own storytellers fit nicely within current AWWW, with its official understanding of words like ‘represent’. (In fact, capable of being ones own storyteller might be a way to define ‘information resource’.) But the nice thing about this picture [as presented by Xiaoshu] is that other kinds of resource, which do not fit at all within the AWWW – things that aren’t documents, ‘non-information resources’ – also fit within it; still, ironically, using the AWWW language, but with a semantic rather than AWWW sense of ‘represent’.
Right now, the semantic web really does not have a coherent story to tell about how it works with non-information resources, other than it should use RDF (plus whatever is sitting on it in higher levels) to describe them; which says nothing, since RDF can describe anything. URIs in RDF are just names, their Web role as http entities semantically irrelevant. Http-range-14 connects access and denotation for document-ish things, but for other things we have no account of how they should or should not be related, or what anything a URI might access via http has got to do with what it denotes.
The way that the three participants (denoted-thing, URI-name and Web-information-resource ‘storyteller’) interact must be basically different when the denoted-thing isn’t an information resource from when it is. All that being suggested here is that there is an account that we could give about this, one that works in both cases and which fits the language of AWWW quite, er, nicely.
A person exists and has properties entirely separate from the Web. Many people have nothing to do with the Web in their entire lives. People are not Web objects. And when the URI is being used in an RDF graph to refer to a person, the fact that it starts with http: is nothing more than a lexical accident, which has no bearing whatever on the role of the URI as a name denoting a person.
I think this particular shoe is on the other foot. If you can actually say, clearly enough to prevent continual trails of endless email debate, what AWWW actually means by ‘represent’, then I’d be delighted if you would use some technical word to refer to that elusive notion. But the word ‘represent’ and its cognates has been a technical word in far larger and more precisely stated forums for over a century; and since the day that Web science has included the semantic web, AWWW has taken an irrevocable step into the same academy. You are using the language of semantics now. If you want to be understood, you have to learn to use it correctly.
All it would do is move the responsibility of deciding what a URI denotes from a rather messy and widely ill-understood distinction based on http codes, to a matter of content negotiation. This would allow phenomena which violate http-range-14, but it would by no means insist on such violations in all cases. In fact, if we were to agree on some simple protocols for content negotiation which themselves referred to http codes, it could provide a uniform mechanism for implementing the http-range decision.
Moreover, this approach would put ‘information resources’ on exactly the same footing as all other things in the matter of how to choose representations of them for various purposes, a uniformity which means little at present but is likely to increase in value in the future.
But right now, for the case where a URI is understood to denote something other than an information resource, we have a completely blank slate. There is nothing which tells our software how to interoperate in this case. Our situation is not a kind of paradise of reference-determination from which Xiaoshu and I are threatening to have everyone banished. Right now for the semantic web, things are about as bad as they can get.
. . . we, as a society, can use [the conventions we decide] for whatever we decide and find convenient. The Web and the Internet are replete with mechanisms which are being used for purposes not intended by their original designers, and which are alien to their original purpose. For a pertinent example, the http-range-14 decision uses http codes in this way. That isn’t what http codes are for.
I have repeated much of this material because I believe it to be of wide import to the semantic Web’s development and future. Obviously, for better understanding, the full thread [4] plus its generous sprinkling of excellent prior documents and discussions is most recommended.
My Take
There are certainly technical aspects to this debate that go well beyond my ken. I strongly suspect there are edge cases for which more complicated technical guidance is warranted.
And, it is true, I have been selective in which sides of this debate I am highlighting and therefore supporting. This is not accidental.
While some in this debate have claimed the need to conform to existing doctrine in order to ensure interoperability or the integrity of software systems, from my different perspective as someone desiring to help build a market by extending reach into the broader public, that argument is false. Let’s take the existing architecture we have, but make our best practices recipes simple, our language clear, and our semantics correct. How can we really promote and grow the semantic Web when our own semantics are so patently challenged?
Our community faces a challenge of poor terminology and muddled concepts (or, perhaps more precisely, concepts defined in relation to the semantic Web that are not in conformance with standard understandings). My strong suspicion is that we risk at present over-specification and just plain confusion in the broader public.
This mailing list debate is hugely important, informative and thought provoking. Xiaoshu deserves thanks for his courage and tenacity in engaging this debate in a non-native language; Pat Hayes deserves thanks for trying to capture the arguments in language and terminology more easily understandable to the rest of us and to add his own considerable experience to the debate, and many of the mail list regulars deserve sincere thanks for being patient and engaged to allow the nuances of these arguments to unfold.
From my standpoint there is real pragmatic value to these arguments that would bring the terminology and semantics of the semantic Web into better understood and more easily communicated usage, all without affecting or changing the underlying architecture of the Web. (Or, so, to my naïve viewpoint, the argument seems to suggest.)
So long as the semantic Web’s practitioners still number in the hundreds, and those with nuanced understanding of these arcane matters likely only in the scores, the time is ripe to get the language and concepts right. Doing so can help our enterprise reach millions and much more quickly.