A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on IRC (such as one of these permanent autoconfirmed members).

Change Proposal for ISSUE-120

From WHATWG Wiki
Revision as of 06:54, 19 January 2011 by Hixie (talk | contribs) (Intentional misimplementation)
Jump to: navigation, search

Summary

Simplify the specification by removing features that are documented to be confusing to users.

Rationale

The premise of this rationale, which is argued in detail below, is that mechanisms that bind arbitrary strings ("prefixes") to other arbitrary strings ("bases"), which can then be used in conjunction with a third set of arbitrary strings ("values") to form identifiers ("terms") that are never explicitly stated in the source, are a language design anti-pattern in the context of technology intended for broad Web deployment (e.g. in text/html).

In the context of RDFa, there are a number of mechanisms to define the mappings of "prefixes" and "bases": the prefix="" attribute, the profile="" attribute (which has an additional layer of indirection), and for legacy reasons the xmlns="" attribute. This change proposal proposes to remove all three of these features based on the same rationale. (At a high level, these features are essentially equivalent, being little more than syntactic sugar for each other.) As a result, the places where RDFa accepts a "value" that could have used one of these arbitrary "prefixes" to create a "term" can no longer do so, and is limited to either giving the "term", or using a predefined syntax with known prefixes (specifically, the empty prefix ":foo" which is short for "http://www.w3.org/1999/xhtml/vocab#foo", and the bnode syntax "_:foo").

Why arbitrary prefix mechanisms are bad

Copy and paste

Copy-and-paste of the source becomes very brittle when two separate parts of a document are needed to make sense of the content. Copy-and-paste is how the Web evolved, so I think it is important to keep it functional and easy.

Cognitive difficulty

Fundamentally, prefixes are an indirection model. Indirection models are very, very hard for people to understand. However, arbitrary prefixes have proved even harder to understand than most indirection mechanisms. The most widely known arbitrary prefix mechanism on the Web is the XML namespaces feature, which is very similar to the three prefix mechanisms in RDFa. It can thus be used as a case study for the problem:

As far back as 2004, Micah wrote "As the author of an O'Reilly book on XForms, I can report that 90% of the technical questions from readers involve confusion related to namespaces".

Parand Darugar has said similar things: "Experience shows XML namespaces can be a common cause of confusion and a major complicating factor in XML adoption."

Derek Denny-Brown, who had been the lead developer of MSXML and System.Xml: "If there is any one of the W3C's family of XML specifications, that has caused me the most grief, XML Namespaces is probably it."

Maciej has also said things to this effect: "Namespaces are an example of the Fundamental Software Engineering Error, which is that something too terrible to actually use can be fixed by adding a level of indirection. Sometimes that is true but software engineers try to do it even when it clearly is not."

Questions about namespaces come up again and again, over many years:

Prefixes are notoriously hard for implementors to get right:

(This covers bugs by such vendors as Sun, Google, Yahoo!, MySpace — and these aren't just bugs that don't affect end-users, like forgetting to quote attributes in HTML.)

Prefixes are notoriously hard for implementors to document:

Prefixes have been notoriously hard for even people in the standards community to understand:

Dynamic changes

Arbitrary prefixes in dynamically changing content (like HTML) are even worse because they require than an observing software agent not only track the value that they are concerned about, but also all possible ways for the value's prefixes to change meaning. So for instance, here:

 <test prefixes="a=http://example.com/">
  <foo>
   <bar>
    <baz content="a:b"/>
   </bar>
  </foo>
 </test>

...if a software agent wants to see when <baz>'s content="" attribute changes to include the value http://example.net/b, he has to not only watch the content="" attribute, but also the prefixes="" attribute of all ancestor elements up the tree, just in case they redefine the prefix "a" to mean "http://example.net/".

Arbitrary prefix mechanisms are unnecessary

In a usability study for microdata, it was discovered that authors in fact have no difficulty dealing with straight URLs rather than shortening them with prefixes:

Intentional misimplementation

At least one implementation that is frequently cited as an argument for keeping the xmlns="" feature (though not the prefixes="" and profile="" features, which are new) is that Google implements it. However, Google's implementation of xmlns="" in RDFa is intentionally crippled to work around authoring mistakes; it only recognises some well-known prefixes. Google's experience in fact has been that Web designers are frequently unable to deploy RDFa on major sites without mistakes, in part due to the prefix mechanisms (but also in part due to other complexities in the RDFa format). Google's implementation and Google's experience with getting large sites deploying RDFa argues against arbitrary prefix mechanisms.

Quoting Othar Hansson, lead developer for Google's RDFa work: "we will also deviate from the standard [...] we expect that some webmasters will forget the xmlns attribute entirely".

Details

This change proposal describes changes to this draft, relative to the specification as it stood on January 11th 2011: http://dev.w3.org/html5/rdfa/

  • In "2.2 RDFa Processor Conformance", add to the first bullet point an exception for features overridden by the Extensions section.
  • In the "3. Extensions to RDFa Core 1.1" section, add a section requiring that when processing CURIEs in attributes on elements is the HTML namespace, the set of mappings from prefixes to URIs must always be treated as empty.
  • In the "3. Extensions to RDFa Core 1.1" section, add a section requiring that the "prefix" attribute not be used in conforming documents and add a section requiring that user agents ignore prefix="" attributes on elements in the HTML namespace.
  • In the "3. Extensions to RDFa Core 1.1" section, add a section requiring that the "profile" attribute not be used in conforming documents and add a section requiring that user agents ignore profile="" attributes on elements in the HTML namespace.
  • In the "3. Extensions to RDFa Core 1.1" section, remove all the sections that refer to the "xmlns:" attributes and replace them with a single section saying that HTML+RDFa does not use the "xmlns:" attributes and that user agents must not let their processing model be affected by "xmlns:" attributes in no namespace (such as those found in text/html).
  • Updates examples and other text accordingly, at the editor's discretion.

Impact

Positive Effects

...

Negative Effects

...

Conformance Class Changes

...

Risk

...

References