Generic Metadata Mechanisms
There have been some requests for introducing generic metadata mechanisms into HTML5.
To help determine what we would need to add, and whether it is worth adding anything, we have to come to an understanding of what the goals and requirements are of such a proposal.
Please document arguments with links to supporting research or links to other wiki pages detailing the anecdotal evidence for or against particular aspects of the goals and requirements.
What is the problem we are trying to solve?
Nobody has yet answered this; please see http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-September/016186.html for commentary and advice on how to fill this in.
Who faces this problem?
This section needs to be much, much more detailed. Who exactly faces the problem we're trying to solve? Name names of communities, organisations, companies, etc; show how they are "suffering" today and how they are currently working around the problem.
Requirements: If we assume that we are going to address this need, what do we need to provide?
Please demonstrate the reasoning behind each requirement, along with examples of how the requirements could be addressed.
A machine-readable and standardized way to apply semantic properties (metadata) to DOM elements in HTML5 and probably XHTML.
These properties are capable of being disambiguated between multiple definitions of the property name.
Finding or defining meaning
We should be able to find or define an "authoritive" meaning for an abstract concept like "title" (eg. book title, job title, person's title, land deed, etc...).
The metadata could be read by UA's and other tools to perform actions that would not be possible without "knowing" what type of thing, quantity, unit or quality an element represents.
The DOM has to be consistent between HTML and XHTML representations. If it isn't, then migrating between the two becomes non-trivial, especially for scripting.
Ease of deployment
The syntax has to be something that Web authors can easily deploy. If authors can't deploy this, then it won't get critical mass and won't matter.
One could argue that tools will be used to deploy this, that it'll mostly be used by big sites like Facebook, and that thus individual authors don't matter, but this kind of argument ("the tools will save us") has been repeatedly shown to not work, because in practice the tools have to be hand-authored too, and so the complexity is just moved to other people.
It has to have a way to include it inline, so that it is quicker for non-professional developers to use and adopt. Also, putting metadata in the same location as content could prevent errors in updates or copying.
It has to have both a way to abstract it from the HTML and a way to include it inline, like JS or CSS, because ...
Where possible the proposal should be resistant to temporary or permanent unavailability of an authoritative source (ie, vocabulary provider). This could be acheived, for example, through a P2P or DNS-like mechanism, or by not relying on external sources (e.g. in the way that SSL certificates are checked).
Not doing this would lead to failures during temporary outages or overloading of an authoritive source of metadata definitions, and may make it more resistant to hostile takeover or shutdown of authority.
Note however that distributing an authoritive source may make it less authoritive.
The proposal should allow metadata and authoritive sources to be reused across elements, pages and sites, because web developers are more likely to use something that does not require repetitively typing the same data.
Multilingual and Multicultural
Not all concepts can be expressed properly in English. A proposal should allow metadata for foreign languages and concepts.
Authority and Security
Since a potential use of metadata appears to be enabling future features of UAs and other tools it follows that this opens the end-user to additional risks. For example could a page author or hijacker feed a virus to a tool by falsely claiming it to be another type of data. In addition could harm be caused when a metadata authority is hijacked by a group to deliberately mislead or blackmail.
In addition could metadata be used for unintended purposes such as spying on or annoying users.
With these risks in mind should there be standard mechanisms for securing metadata and verifying its source (such as signing certificates, encryption or white/black lists)
Related Proposals, Research and Discussions
- WHATWG Discussions
- w3c Semantic Web Interest Group (SWIG)
- W3C SWIG Mailing List Archive
- GRRDL (Transformations of XHTML to RDF)
- RDFa vs. CRDF (Cascading RDF Proposal)
- Embedded RDF Wiki
- RDF in HTML (Embedded RDF Examples)
- Wikipedia page on Semantic Web
- What are Microformats? (microformats.org)
- Friend of a Friend Project (FOAF)
- Dublin Core Metadata Initiative (DCMI)
Inline (as multiple attributes)
Multiple new metadata attributes such as in RDFa.
- Pro: Reasonably simple to add to spec.
- Con: Dependent on changes to HTML spec for future changes to metadata spec.
- Con: Would probably require a different syntax for block or external version of same metadata (makes it hard to move).
- Con: Requires documentation and standardization in the HTML spec rather than through a seperate document and standards body.
- Con: More potential for attribute name collisions with future HTML attributes.
- Con: Appears to make metadata reuse difficult.
Inline (in a single attribute)
One metadata attribute with complex content (such as the style attribute)
- Pro: New properties can be added without changing the HTML spec.
- Pro: Changing properties does not affect the DOM.
- Pro: The properties are grouped together.
- Pro: Requirements very similar to style="" and onclick="".
- Con: Requires new metadata format to be created.
- Con: Makes it harder to select individual property/value pairs through CSS or DOM scripting. (Might require dedicated APIs... Ugh.)
Block or external metadata
- Pro: Does not clutter the HTML.
- Pro: Gives it more space to develop such as style did once it was abstracted from HTML.
- Pro: May be easier to import, export, reuse, sign and translate.
- Pro: May be applied to elements that the author cannot change attributes on (eg, dynamic, protected or generated content).
- Pro: Speed up where external metadata can be cached.
- Con: Requires new metadata format to be created.
- Con: CSS-like targeting or use of class or id to apply metadata adds complexity/indirection.