A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
Generic Metadata Mechanisms: Difference between revisions
Line 74: | Line 74: | ||
* Con: Proposal would be more complex. | * Con: Proposal would be more complex. | ||
* Con: The author would need to correctly define the region or culture being used. | * Con: The author would need to correctly define the region or culture being used. | ||
== Authority and Security == | |||
Since a potential use of metadata appears to be enabling future features of UAs and other tools it follows that this opens the end-user to additional risks. For example could a page author or hijacker feed a virus to a tool by falsely claiming it to be another type of data. In addition could harm be caused when a metadata authority is hijacked by a group to deliberately mislead or blackmail. | |||
In addition could metadata be used for unintended purposes such as spying on or annoying users. | |||
With these risks in mind should there be standard mechanisms for securing metadata and verifying its source (such as signing certificates, encryption or white/black lists) | |||
* Pro: Web users and tool vendors are more likely to enable a feature that presents minimal risk. | |||
* Pro: More trust can be placed in the accuracy, purpose and meaning of a piece of data. | |||
* Con: Proposal would be more complex. | |||
* Con: Certificates themselves could be hijacked. | |||
* Con: Certificate fees or vendor preferences could place minority groups at a disadvantage when becoming an authoritive source. | |||
== Choice of format == | |||
There are already several metadata formats. In the future there may be more. | |||
* Pro: Metadata could be directly repurposed from another system (like ID3s in a music collection) without conversion. | |||
* Pro: A future metadata technology might become dominant in non-HTML systems (like libraries or operating systems) and become the standard for everything but the web. | |||
* Pro: Unforeseen faults or limitations of the current system may require it to be gradually phased out in favor of something else without breaking older sites. | |||
* Con: More complexity for UA and tool developers. | |||
* Con: Reduces the possibility of getting a single global metadata standard (which may or may not be a good thing). | |||
= Related Proposals, Research and Discussions = | = Related Proposals, Research and Discussions = |
Revision as of 13:01, 1 September 2008
There have been some requests for introducing generic metadata mechanisms into HTML5.
To help determine what we would need to add, and whether it is worth adding anything, we have to come to an understanding of what the goals and requirements are of such a proposal.
Please document arguments with links to supporting research or links to other wiki pages detailing the anecdotal evidence for or against particular aspects of the goals and requirements.
Goals
What is the problem we are trying to solve?
A machine-readable and standardized way to apply semantic properties (metadata) to DOM elements in HTML5 and probably XHTML. These properties are capable of being disambiguated between multiple definitions of the property name. We should be able to find or define an "authoritive" meaning for an abstract concept like "title" (eg. book title, job title, person's title, land deed, etc...). The metadata could be read by UA's and other tools to perform actions that would not be possible without "knowing" what type of thing, quantity, unit or quality an element represents.
Who faces this problem?
Currently a few groups. In the future metadata may become necessary for the average "web consumer" (human or machine) to sort actual information from presentation and structural cruft. In other words, a useful tool for determining the meaning, terms of use, quality and/or authority of a piece of data inside (X)HTML.
Requirements: If we assume that we are going to address this need, what do we need to provide?
Please list each requirement in its own subsection so that arguments pro and con and links to supporting research can be included.
The DOM has to be consistent between HTML and XHTML representations
- Pro: If it isn't, then migrating between the two becomes complicated.
The syntax has to be something that Web authors can easily deploy
- Pro: If authors can't deploy this, then it won't get critical mass and won't matter
- Con: Tools will be used to deploy this. It'll mostly be used by big sites like Facebook. So individual authors don't matter.
It has to have both a way to abstract it from the HTML and a way to include it inline
- Examples: Javascript and CSS
- Pro: Inline it may be quicker for non-professional developers to use and adopt.
- Pro: Abstracted it is more flexible for professional developers, does not clutter the HTML, and gives it more space to develop such as style did once it was abstracted from HTML.
- Con: More complicated than having either.
Inline it should follow the conventions of CSS and Javascript
- Javascript example: onclick="doSomething();doSomethingElse();foo();"
- CSS example: style="width:300px;height:200px;border:1px solid #ccc;font-size:1em;"
- Pro: The properties are grouped together and in the case of style only uses one attribute name. This keeps the markup cleaner and better organized than creating multiple attributes directly on the elements. Compare with how we added style to elements before CSS.
- Con: Makes it harder to select individual property/value pairs through CSS or the DOM. (Might required dedicated APIs... Ugh.)
Sustainability
Where possible the proposal should be resistant to temporary or permanent unavailability of an authoritative source (ie, vocabulary provider). This could be acheived, for example, through a P2P or DNS-like mechanism.
- Pro: Metadata remains usable during temporary outage or overload of an authoritive source of metadata definitions.
- Pro: Possibly, but not necessarily, more resistant to hostile takeover or shutdown of authority.
- Con: Probably makes proposal more technically complex and may be difficult, expensive or impossible to solve.
- Con: Distributing an authoritive source may make it less authoritive.
Reuse
The proposal should allow metadata and authoritive sources to be reused across elements, pages and sites.
- Pro: Web developers are more likely to use something that does not require repetitively typing the same data.
- Con: Proposal may be more complex.
- Con: Reuse may change or dilute the semantics of the metadata.
Multilingual and Multicultural
Not all concepts can be expressed properly in English. A proposal should allow metadata for foreign languages and concepts.
- Pro: Web developers are more likely to use something in their own language.
- Pro: More concepts and measures can be expressed.
- Con: Proposal would be more complex.
- Con: The author would need to correctly define the region or culture being used.
Authority and Security
Since a potential use of metadata appears to be enabling future features of UAs and other tools it follows that this opens the end-user to additional risks. For example could a page author or hijacker feed a virus to a tool by falsely claiming it to be another type of data. In addition could harm be caused when a metadata authority is hijacked by a group to deliberately mislead or blackmail.
In addition could metadata be used for unintended purposes such as spying on or annoying users.
With these risks in mind should there be standard mechanisms for securing metadata and verifying its source (such as signing certificates, encryption or white/black lists)
- Pro: Web users and tool vendors are more likely to enable a feature that presents minimal risk.
- Pro: More trust can be placed in the accuracy, purpose and meaning of a piece of data.
- Con: Proposal would be more complex.
- Con: Certificates themselves could be hijacked.
- Con: Certificate fees or vendor preferences could place minority groups at a disadvantage when becoming an authoritive source.
Choice of format
There are already several metadata formats. In the future there may be more.
- Pro: Metadata could be directly repurposed from another system (like ID3s in a music collection) without conversion.
- Pro: A future metadata technology might become dominant in non-HTML systems (like libraries or operating systems) and become the standard for everything but the web.
- Pro: Unforeseen faults or limitations of the current system may require it to be gradually phased out in favor of something else without breaking older sites.
- Con: More complexity for UA and tool developers.
- Con: Reduces the possibility of getting a single global metadata standard (which may or may not be a good thing).
Related Proposals, Research and Discussions
- WHATWG Discussions
- w3c Semantic Web Interest Group (SWIG)
- W3C SWIG Mailing List Archive
- GRRDL (Transformations of XHTML to RDF)
- RDFa vs. CRDF (Cascading RDF Proposal)
- Embedded RDF Wiki
- RDF in HTML (Embedded RDF Examples)
- Wikipedia page on Semantic Web
- What are Microformats? (microformats.org)
- Friend of a Friend Project (FOAF)
- Dublin Core Metadata Initiative (DCMI)