A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on IRC (such as one of these permanent autoconfirmed members).

Difference between revisions of "MetaExtensions"

From WHATWG Wiki
Jump to: navigation, search
m (Layout spaced out because keyword ran into description and seemed to reverse meaning. Named new link. Extra comma cut.)
(Added keywords page-datetime & -version, geographic-coverage, datetime-coverage, -start, -end, & -vague, author, creator, publisher, rights, addmark & addmarklocal, & MSSmartTagsPreventParsing.)
Line 35: Line 35:
 
| A comma-separated list of operators explaining how search engine crawlers should treat the content. Possible values are "noarchive" to prevent cached versions, "noindex" to prevent indexing, and "nofollow" works as the link rel value with the same name. This meta name is already supported by every popular search engine.
 
| A comma-separated list of operators explaining how search engine crawlers should treat the content. Possible values are "noarchive" to prevent cached versions, "noindex" to prevent indexing, and "nofollow" works as the link rel value with the same name. This meta name is already supported by every popular search engine.
 
| [http://www.robotstxt.org/wc/exclusion.html#meta Robots exclusion protocol], [http://www.google.com/support/webmasters/bin/answer.py?answer=61050 Googlebot], [http://help.yahoo.com/help/us/ysearch/slurp/index.html Yahoo! Slurp], and [http://about.ask.com/en/docs/about/webmasters.shtml Ask.com Teoma]
 
| [http://www.robotstxt.org/wc/exclusion.html#meta Robots exclusion protocol], [http://www.google.com/support/webmasters/bin/answer.py?answer=61050 Googlebot], [http://help.yahoo.com/help/us/ysearch/slurp/index.html Yahoo! Slurp], and [http://about.ask.com/en/docs/about/webmasters.shtml Ask.com Teoma]
 +
|
 +
| Proposal
 +
|-
 +
| page-datetime
 +
| Better ranking in search engine results for recency or relevance to an event date would be aided by a standard format robots can parse. Users would save search time by not having to load many pages to find which ones are new or date-relevant. To supply a consistent and known format, the value for this keyword is a date-time expression formed in accordance with http://www.w3.org/TR/NOTE-datetime (albeit a note that's at W3C only for discussion). Any of the six levels of granularity are acceptable, such as expressing only a year. Should this keyword appear more than once within the head element, only the first one so appearing is determinative.
 +
|
 +
|
 +
| Proposal
 +
|-
 +
| page-version
 +
| Pages may be revised several times in a day. While date-time given to a granularity of a fraction of a second would often suffice, when a page has to be approved more than once before posting, any or no such time may be correct (without this keyword, a comment could be necessary but probably not parsable by an engine). In addition, versions regardless of date may show consecutiveness and can replace a date that must be vague. In that case, a version number may be more useful for searches and so a robot-parsable format is needed. The keyword's value is stated in ASCII digits, is any nonnegative base-10 rational number expressed as an integer or a decimal, and may be padded with any number of leading zeros to support extraction for ASCII sorting. Should this keyword appear more than once within the head element, only the first one so appearing is determinative.
 +
|
 +
|
 +
| Proposal
 +
|-
 +
| geographic-coverage
 +
| The author may be the best expert on the geographic relevance of the content. Leaving that to search engine analysis may be too chancy without search engine optimization, which analysis is difficult to apply by algorithm to, e.g., historical papers and medical epidemiological studies which may mention locales only once. The value for this keyword is a colon-, semicolon-, and comma-separated list of one or more locales. Regions in outer space, hemispheres (southern, eastern, etc.), international waters (e.g., oceans), polar regions, continents, international bodies and collections (e.g., all U.N. member nations), nations, and physical features (e.g., Mount Everest) are colon-separated; locales within one nation and only one level down, such as states, are semicolon-separated; and locales within a locale within a nation, such as cities and neighborhoods within a state within a nation, are comma-separated. (Not proposed is separating with any single punctuation mark, such as a comma, any list of locales where no nesting is desired, because, despite the convenience in coding, it would be hard for an engine to distinguish New York from New York.) For consistency of spelling, an authority or several should be settled upon, with legal and well-known names and common abbreviations all being acceptable; I'm not proposing one here now (relying on a ccTLD list might be too complex to implement and still assure coding consistency, e.g., occasionally TLDs can be phased out and off of IANA's list); promulgating lists may best be done publicly by search engine managements. Authority lists should include historical names (e.g., the Roman Empire), name differences resulting from international dispute and from domestic conflict between purported governments, and possibly some mythical names (e.g., ultima Thule). A complication is translation; e.g., "China" is not the name in Chinese. Also desirable is allowing Unicode for non-Roman alphabet-using locales, but at present that may raise technical problems, including computer security issues, that are not yet readily soluble.
 +
|
 +
| geography; geography-coverage; geographic; geographical; geographical-coverage
 +
| Proposal
 +
|-
 +
| datetime-coverage
 +
| The author may be the best expert on which time frame is most relevant to the content. Leaving that to search engine analysis may be too chancy without search engine optimization, which analysis is difficult to apply by algorithm to, e.g., historical papers that may focus on the 1800s but mention 1731 and 1912 unimportantly. The value for this keyword is a date or time -- not a range and not vague, for which other keywords are proposed -- in a format in accordance with http://www.w3.org/TR/NOTE-datetime (albeit a note that's at W3C only for discussion). Any of the six levels of granularity are acceptable, such as expressing only a year. Should this keyword appear more than once within the head element, only the first one so appearing is determinative.
 +
|
 +
| dates; date-coverage; dates-coverage; times; time-coverage; times-coverage; era
 +
| Proposal
 +
|-
 +
| datetime-coverage-start
 +
| This is identical to the keyword datetime-coverage except that it represents only the start. If this keyword is used without datetime-coverage-end (also proposed), its value is interpreted as starting a range without an end. This keyword and datetime-coverage-end may be used in the same or separate meta tags. The order of the keywords or the tags containing them doesn't matter.
 +
|
 +
| dates-start; date-coverage-start; dates-coverage-start; times-start; time-coverage-start; times-coverage-start; era-start
 +
| Proposal
 +
|-
 +
| datetime-coverage-end
 +
| This is identical to the keyword datetime-coverage except that it represents only the end. If this keyword is used without datetime-coverage-start (also proposed), its value is interpreted as ending a range without a start. This keyword and datetime-coverage-start may be used in the same or separate meta tags. The order of the keywords or the tags containing them doesn't matter.
 +
|
 +
| dates-end; date-coverage-end; dates-coverage-end; times-end; time-coverage-end; times-coverage-end; era-end
 +
| Proposal
 +
|-
 +
| datetime-coverage-vague
 +
| This is identical to the keyword datetime-coverage except that its value is not necessarily crisp. This keyword should be used only when datetime-coverage, datetime-coverage-start, and datetime-coverage-end are inappropriate, but there's no ban on using all four. Any text can be the value (e.g., Pleistocene, 1820s, Tuesdays, or before we were born). If this keyword is used with datetime-coverage, datetime-coverage-start, or datetime-coverage-end, the vague value is to be exploited along with the value/s for the other keyword/s. This keyword and datetime-coverage, datetime-coverage-start, and datetime-coverage-end may be used in the same or separate meta tags. The order of the keywords or the tags containing them doesn't matter. Should this keyword appear more than once within the head element, all are determinative.
 +
|
 +
| dates-vague; date-coverage-vague; dates-coverage-vague; times-vague; time-coverage-vague; times-coverage-vague; era-vague
 +
| Proposal
 +
|-
 +
| author
 +
| Searching for one page author's Web work requires a standard robot-parsable format for the information. A personal name, institutional name, or other text entry is permissible. One element or one keyword represents only one author. Multiple authors are to be represented with multiple keywords in the tag or multiple tags. Search engines may index by any component of a name, so a page author need only enter a name once in one first-last or family-given order (e.g., Chris Ng or Ng, Chris, but not requiring both).
 +
|
 +
| page-author
 +
| Proposal
 +
|-
 +
| creator
 +
| Searching for one content creator's work requires a standard robot-parsable format for the information. A personal name, institutional name, or other text entry is permissible. One element or one keyword represents only one creator. Multiple creators are to be represented with multiple keywords in the tag or multiple tags. Search engines may index by any component of a name, so a content creator need only enter a name once in one first-last or family-given order (e.g., Pat Thunderbird or Thunderbird, Pat, but not requiring both).
 +
|
 +
| content-creator
 +
| Proposal
 +
|-
 +
| publisher
 +
| Searching for one content or page publisher's work requires a standard robot-parsable format for the information. This often differs from creator or author when the publisher is an institution. An institutional name, personal name, or other text entry is permissible. One element or one keyword represents only one publisher. Multiple publishers are to be represented with multiple keywords in the tag or multiple tags, although multiple publishers are less common than multiple authors or creators; multiplicity is more likely for a legal name and a well-known name. Search engines may index by any component of a name, so a publisher need only enter a name once in one order.
 +
|
 +
|
 +
| Proposal
 +
|-
 +
| rights
 +
| As a page effectively appears in at least two forms, usually one as interpreted and displayed on a device and the other as source code, arguably intellectual property rights that must be asserted must be asserted in ways understandable in both contexts. For example, an uninterpreted © is a raw representation that may legally fail as part of copyright notice to someone seeing source code and not the display, important when someone wants to copy source code for use elsewhere and may rely on a defense of innocent infringement. While such assertions can be made in a comment element, it may be helpful to have a tag that search engines can parse and index verbatim. The value is Unicode text, and may include standard and nonstandard notices, invocations of licenses such as GFDL and ASCAP. and any other information. ASCII text would not suffice when a name or notice legally may have to be in a non-Roman alphabet, but no alternative may yet exist in HTML5. Search engine storage may impose a length limit, but, because of legal consequences, if the value's length exceeds a given limit the search index should retain or interpret none of it but only refer to it. For the synonymy, ''IP'', ''IP-rights'', and ''IP-right'' are not reserved; while the abbreviation ''IP'' 'intellectual property' is common among attorneys in the U.S., page authors will more likely be computerate, and the abbreviation may be wanted for 'Internet Protocol'.
 +
|
 +
| copyright; right; patent; trademark; service-mark; license; licensing; intellectual-property; intellectual-property-rights; intellectual-property-right
 +
| Proposal
 +
|-
 +
| addmark
 +
| The HTML5 mark element appears able to support insertion of markup by either the user's server or third-party intermediate servers as a way of advertising on, commenting on, restyling, or hiding website content without the website owner's knowledge or consent and perhaps without the user's knowledge or consent, either. Therefore, a page author should be able to prevent the adding of a mark element not already in a page. The addmark keyword with a value of "false" meets that need. (A value of "true" is trivial, being identical in meaning to the absence of the keyword.)
 +
| [[http://www.w3.org/Bugs/Public/show_bug.cgi?id=6774|Bug 6774]]
 +
|
 +
| Proposal
 +
|-
 +
| addmarklocal
 +
| This complements the addmark keyword, separately proposed. A page author might want to allow -- only locally -- the adding of a mark element not already in a page. The addmarklocal keyword with a value of "true" meets that need. (A value of "false" when addmark="false" is trivial, being identical in meaning to the absence of the addmarklocal keyword.) Once addmarklocal is written into a page, how it is implemented is beyond the scope of HTML; for example, a particular intranet for whom the author works might look for the keyword and implement it as it sees fit. The author is to recognize the risk inherent in that nothing bars a nonlocal user agent or server from using addmarklocal to reverse addmark, other than that the HTML5 standard as extended by this Wiki bars it, but arguably only well-behaved user agents can be counted on to obey that.
 +
| [[http://www.w3.org/Bugs/Public/show_bug.cgi?id=6774|Bug 6774]]
 +
|
 +
| Proposal
 +
|-
 +
| MSSmartTagsPreventParsing
 +
| Microsoft introduced into Internet Explorer 6 Beta a feature that some website designers wished to preclude from applying in order to prevent public misunderstanding of their websites. The feature allowed a browser to add information but at a risk that users wouldn't know that it wasn't supplied by the website. This keyword was provided by Microsoft for those of us who wanted it. Its value was "TRUE". Microsoft spelled the keyword with some capitals and the value in all capitals but whether capitalization was required for either is unknown; some opinions vary. Microsoft has apparently removed this instruction from its website on the ground that the beta version is no longer available and is not supported, but that doesn't assure that some users aren't still using the beta browser, perhaps inadvertently. Therefore, designers may wish to continue using the keyword and value and they are preserved here.
 +
| e.g., [[http://www.theregister.co.uk/2001/06/25/web_sites_banish_those_winxp/|The Register (U.K.)]], [[http://cc.uoregon.edu/cnews/summer2001/summer2001.pdf|Univ. Oregon (U.S.) (PDF at p. 18)]], & [[http://trillian.mit.edu/~jc/demo/SmartTagsOff.html|John Chambers (U.S.) (job résumé near root)]], all as accessed 4-19-09
 
|  
 
|  
 
| Proposal
 
| Proposal

Revision as of 07:22, 20 April 2009

This page lists the allowed extension values for the name="" attribute of the <meta> element in HTML5. You may add your own values to this list, which makes them legal HTML5 metadata names. We ask that you try to avoid redundancy; if someone has already defined a name that does roughly what you want, please reuse it.

Keyword Brief description Link to more details Synonyms Status
cache Value must be "public", "private", or "no-cache". Intended as a simple way to tell user agents whether to store a copy of the document or not. An alternate for HTTP/1.1's cache-control; for publishers without access to modifying cache-control.
This doesn't actually work; use HTTP headers instead.
none Unendorsed
keywords A comma-separated list of keywords that describe this page. Not very necessary these days; search engines use much more sophisticated means to determine the page's contents. none Unendorsed
keywords-not       A comma-separated list of negative keywords that distinguish a closely-related theme from this page's true theme, to support Boolean NOT searches often more realistically than visible text can, especially when both themes share the same lexicon. W3C Bug 6609 keywords-negative; keyword-negative; negative-keywords; negative-keyword; keywords-neg; keyword-neg; neg-keywords; neg-keyword; not-keywords; not-keyword; keyword-not Proposal
description A brief phrase that describes this page. Could be used for search results or bookmarks lists. none Unendorsed
robots A comma-separated list of operators explaining how search engine crawlers should treat the content. Possible values are "noarchive" to prevent cached versions, "noindex" to prevent indexing, and "nofollow" works as the link rel value with the same name. This meta name is already supported by every popular search engine. Robots exclusion protocol, Googlebot, Yahoo! Slurp, and Ask.com Teoma Proposal
page-datetime Better ranking in search engine results for recency or relevance to an event date would be aided by a standard format robots can parse. Users would save search time by not having to load many pages to find which ones are new or date-relevant. To supply a consistent and known format, the value for this keyword is a date-time expression formed in accordance with http://www.w3.org/TR/NOTE-datetime (albeit a note that's at W3C only for discussion). Any of the six levels of granularity are acceptable, such as expressing only a year. Should this keyword appear more than once within the head element, only the first one so appearing is determinative. Proposal
page-version Pages may be revised several times in a day. While date-time given to a granularity of a fraction of a second would often suffice, when a page has to be approved more than once before posting, any or no such time may be correct (without this keyword, a comment could be necessary but probably not parsable by an engine). In addition, versions regardless of date may show consecutiveness and can replace a date that must be vague. In that case, a version number may be more useful for searches and so a robot-parsable format is needed. The keyword's value is stated in ASCII digits, is any nonnegative base-10 rational number expressed as an integer or a decimal, and may be padded with any number of leading zeros to support extraction for ASCII sorting. Should this keyword appear more than once within the head element, only the first one so appearing is determinative. Proposal
geographic-coverage The author may be the best expert on the geographic relevance of the content. Leaving that to search engine analysis may be too chancy without search engine optimization, which analysis is difficult to apply by algorithm to, e.g., historical papers and medical epidemiological studies which may mention locales only once. The value for this keyword is a colon-, semicolon-, and comma-separated list of one or more locales. Regions in outer space, hemispheres (southern, eastern, etc.), international waters (e.g., oceans), polar regions, continents, international bodies and collections (e.g., all U.N. member nations), nations, and physical features (e.g., Mount Everest) are colon-separated; locales within one nation and only one level down, such as states, are semicolon-separated; and locales within a locale within a nation, such as cities and neighborhoods within a state within a nation, are comma-separated. (Not proposed is separating with any single punctuation mark, such as a comma, any list of locales where no nesting is desired, because, despite the convenience in coding, it would be hard for an engine to distinguish New York from New York.) For consistency of spelling, an authority or several should be settled upon, with legal and well-known names and common abbreviations all being acceptable; I'm not proposing one here now (relying on a ccTLD list might be too complex to implement and still assure coding consistency, e.g., occasionally TLDs can be phased out and off of IANA's list); promulgating lists may best be done publicly by search engine managements. Authority lists should include historical names (e.g., the Roman Empire), name differences resulting from international dispute and from domestic conflict between purported governments, and possibly some mythical names (e.g., ultima Thule). A complication is translation; e.g., "China" is not the name in Chinese. Also desirable is allowing Unicode for non-Roman alphabet-using locales, but at present that may raise technical problems, including computer security issues, that are not yet readily soluble. geography; geography-coverage; geographic; geographical; geographical-coverage Proposal
datetime-coverage The author may be the best expert on which time frame is most relevant to the content. Leaving that to search engine analysis may be too chancy without search engine optimization, which analysis is difficult to apply by algorithm to, e.g., historical papers that may focus on the 1800s but mention 1731 and 1912 unimportantly. The value for this keyword is a date or time -- not a range and not vague, for which other keywords are proposed -- in a format in accordance with http://www.w3.org/TR/NOTE-datetime (albeit a note that's at W3C only for discussion). Any of the six levels of granularity are acceptable, such as expressing only a year. Should this keyword appear more than once within the head element, only the first one so appearing is determinative. dates; date-coverage; dates-coverage; times; time-coverage; times-coverage; era Proposal
datetime-coverage-start This is identical to the keyword datetime-coverage except that it represents only the start. If this keyword is used without datetime-coverage-end (also proposed), its value is interpreted as starting a range without an end. This keyword and datetime-coverage-end may be used in the same or separate meta tags. The order of the keywords or the tags containing them doesn't matter. dates-start; date-coverage-start; dates-coverage-start; times-start; time-coverage-start; times-coverage-start; era-start Proposal
datetime-coverage-end This is identical to the keyword datetime-coverage except that it represents only the end. If this keyword is used without datetime-coverage-start (also proposed), its value is interpreted as ending a range without a start. This keyword and datetime-coverage-start may be used in the same or separate meta tags. The order of the keywords or the tags containing them doesn't matter. dates-end; date-coverage-end; dates-coverage-end; times-end; time-coverage-end; times-coverage-end; era-end Proposal
datetime-coverage-vague This is identical to the keyword datetime-coverage except that its value is not necessarily crisp. This keyword should be used only when datetime-coverage, datetime-coverage-start, and datetime-coverage-end are inappropriate, but there's no ban on using all four. Any text can be the value (e.g., Pleistocene, 1820s, Tuesdays, or before we were born). If this keyword is used with datetime-coverage, datetime-coverage-start, or datetime-coverage-end, the vague value is to be exploited along with the value/s for the other keyword/s. This keyword and datetime-coverage, datetime-coverage-start, and datetime-coverage-end may be used in the same or separate meta tags. The order of the keywords or the tags containing them doesn't matter. Should this keyword appear more than once within the head element, all are determinative. dates-vague; date-coverage-vague; dates-coverage-vague; times-vague; time-coverage-vague; times-coverage-vague; era-vague Proposal
author Searching for one page author's Web work requires a standard robot-parsable format for the information. A personal name, institutional name, or other text entry is permissible. One element or one keyword represents only one author. Multiple authors are to be represented with multiple keywords in the tag or multiple tags. Search engines may index by any component of a name, so a page author need only enter a name once in one first-last or family-given order (e.g., Chris Ng or Ng, Chris, but not requiring both). page-author Proposal
creator Searching for one content creator's work requires a standard robot-parsable format for the information. A personal name, institutional name, or other text entry is permissible. One element or one keyword represents only one creator. Multiple creators are to be represented with multiple keywords in the tag or multiple tags. Search engines may index by any component of a name, so a content creator need only enter a name once in one first-last or family-given order (e.g., Pat Thunderbird or Thunderbird, Pat, but not requiring both). content-creator Proposal
publisher Searching for one content or page publisher's work requires a standard robot-parsable format for the information. This often differs from creator or author when the publisher is an institution. An institutional name, personal name, or other text entry is permissible. One element or one keyword represents only one publisher. Multiple publishers are to be represented with multiple keywords in the tag or multiple tags, although multiple publishers are less common than multiple authors or creators; multiplicity is more likely for a legal name and a well-known name. Search engines may index by any component of a name, so a publisher need only enter a name once in one order. Proposal
rights As a page effectively appears in at least two forms, usually one as interpreted and displayed on a device and the other as source code, arguably intellectual property rights that must be asserted must be asserted in ways understandable in both contexts. For example, an uninterpreted © is a raw representation that may legally fail as part of copyright notice to someone seeing source code and not the display, important when someone wants to copy source code for use elsewhere and may rely on a defense of innocent infringement. While such assertions can be made in a comment element, it may be helpful to have a tag that search engines can parse and index verbatim. The value is Unicode text, and may include standard and nonstandard notices, invocations of licenses such as GFDL and ASCAP. and any other information. ASCII text would not suffice when a name or notice legally may have to be in a non-Roman alphabet, but no alternative may yet exist in HTML5. Search engine storage may impose a length limit, but, because of legal consequences, if the value's length exceeds a given limit the search index should retain or interpret none of it but only refer to it. For the synonymy, IP, IP-rights, and IP-right are not reserved; while the abbreviation IP 'intellectual property' is common among attorneys in the U.S., page authors will more likely be computerate, and the abbreviation may be wanted for 'Internet Protocol'. copyright; right; patent; trademark; service-mark; license; licensing; intellectual-property; intellectual-property-rights; intellectual-property-right Proposal
addmark The HTML5 mark element appears able to support insertion of markup by either the user's server or third-party intermediate servers as a way of advertising on, commenting on, restyling, or hiding website content without the website owner's knowledge or consent and perhaps without the user's knowledge or consent, either. Therefore, a page author should be able to prevent the adding of a mark element not already in a page. The addmark keyword with a value of "false" meets that need. (A value of "true" is trivial, being identical in meaning to the absence of the keyword.) [6774] Proposal
addmarklocal This complements the addmark keyword, separately proposed. A page author might want to allow -- only locally -- the adding of a mark element not already in a page. The addmarklocal keyword with a value of "true" meets that need. (A value of "false" when addmark="false" is trivial, being identical in meaning to the absence of the addmarklocal keyword.) Once addmarklocal is written into a page, how it is implemented is beyond the scope of HTML; for example, a particular intranet for whom the author works might look for the keyword and implement it as it sees fit. The author is to recognize the risk inherent in that nothing bars a nonlocal user agent or server from using addmarklocal to reverse addmark, other than that the HTML5 standard as extended by this Wiki bars it, but arguably only well-behaved user agents can be counted on to obey that. [6774] Proposal
MSSmartTagsPreventParsing Microsoft introduced into Internet Explorer 6 Beta a feature that some website designers wished to preclude from applying in order to prevent public misunderstanding of their websites. The feature allowed a browser to add information but at a risk that users wouldn't know that it wasn't supplied by the website. This keyword was provided by Microsoft for those of us who wanted it. Its value was "TRUE". Microsoft spelled the keyword with some capitals and the value in all capitals but whether capitalization was required for either is unknown; some opinions vary. Microsoft has apparently removed this instruction from its website on the ground that the beta version is no longer available and is not supported, but that doesn't assure that some users aren't still using the beta browser, perhaps inadvertently. Therefore, designers may wish to continue using the keyword and value and they are preserved here. e.g., [Register (U.K.)], [Oregon (U.S.) (PDF at p. 18)], & [Chambers (U.S.) (job résumé near root)], all as accessed 4-19-09 Proposal

For the "Status" section to be changed to "Accepted", the proposed keyword must either have been through the Microformats process, and been approved by the Microformats community; or must be defined by a W3C specification in the Candidate Recommendation or Recommendation state. If it fails to go through this process, it is "Unendorsed".

For more details, see the HTML5 specification.