A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
Rationale: Difference between revisions
m (→Modifying existing semantics: Punctuation cleanup) |
(Corrected misc issues) |
||
(43 intermediate revisions by 2 users not shown) | |||
Line 10: | Line 10: | ||
9. skip links?? | 9. skip links?? | ||
10. http://www.mail-archive.com/[email protected]/msg23220.html | 10. http://www.mail-archive.com/[email protected]/msg23220.html | ||
11. | 11. Rationale of scoped attribute (see for example http://html5doctor.com/the-scoped-attribute/) | ||
--> | --> | ||
This rationale document is supplemental to the WHATWG [ | This rationale document is supplemental to the WHATWG [https://html.spec.whatwg.org//multipage/ HTML Living Standard] specification. It is a work-in-progress. | ||
== General Rationale == | == General Rationale == | ||
=== | === In overall terms, what determines what’s in the spec? === | ||
==== Already-implemented features ==== | |||
In the past, the contents of the spec tended to be determined by browsers’ <i>de facto</i> feature set. After all, one of the main purposes of the spec is to describe reality (regardless of what the spec editor, members, and contributors think). Historically, vendors competed with one another by implementing features without regard to any specification. It’s a relatively recent phenomenon that vendors compete by their conformance to the spec. That is to say, in the past, browsers tended to dictate what was in the spec, whereas today, it's the converse. | |||
=== Using elements where scripts | Thus, the spec must include all already-implemented features. A feature may be moved to the obsolete section if there is consensus that the feature should not be used in new content. | ||
==== Member and contributor feedback ==== | |||
When adding features to the spec, the editor takes into account member and contributor feedback (via the WHATWG mailing list and IRC channel). If the editor believes inclusion of a feature is justified, he will add it to the spec. Ultimately, whether a feature remains in the spec depends on whether or not vendors (at least two) decide to implement the feature. | |||
=== One vendor, one veto === | |||
Part of the the goal of the WHATWG is to document how web browsers actually handle HTML. As such, browser vendors already have veto power—by not following the standard. The W3C and WHATWG do not have any enforcement power and can only write what browsers are willing to implement. Not removing features from the HTML standard that at least one browser vendor has stated they are unwilling to implement causes the HTML spec to not accurately document reality.<ref>http://lists.w3.org/Archives/Public/public-html/2009Jul/0257.html -- Re: Codecs for <video> and <audio></a></ref><ref>http://lists.w3.org/Archives/Public/www-archive/2009Jul/0075.html --Formal Objection to One vendor, One Veto</ref> The veto isn’t a power that we grant browsers; it’s a right that they earn on their own by virtue of having users. The minimum market share for a veto is somewhere around 1%.<ref>http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-June/026897.html</ref> | |||
=== Why is everything else around us developing so fast, but the web is so slow to adopt anything?<ref>http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2013-October/041037.html</ref> === | |||
Because to get something adopted in a browser, you need to do the following (not always in this order): | |||
<ol> | |||
<li>Have someone design the feature. | |||
<li>Have someone write a specification for it. | |||
<li>Have some people write tests for it. | |||
<li>Have one browser implement it. | |||
<li>Have another browser implement it. | |||
<li>Have another browser implement it. | |||
<li>Have another browser implement it. | |||
<li>Have people document it. | |||
</ol> | |||
This is in contrast to “everything else,” which just needs: | |||
<ol> | |||
<li>Someone to implement it. | |||
</ol> | |||
=== Using elements where scripts “work” === | |||
In addition, arguments were made that JavaScript-based implementations of details suffer from problems and limitations. Scripting behavior may be inconsistent across browsers, or even unavailable in some contexts. Accessibility is "bolted on", allowing more opportunity for author error, even when using libraries. The data model is not exposed in a consistent way in the markup. And matching native appearance and behavior across a range of platforms may be impractical.<ref>http://lists.w3.org/Archives/Public/public-html/2010Jun/att-0659/issue-93-decision.html</ref> | In addition, arguments were made that JavaScript-based implementations of details suffer from problems and limitations. Scripting behavior may be inconsistent across browsers, or even unavailable in some contexts. Accessibility is "bolted on", allowing more opportunity for author error, even when using libraries. The data model is not exposed in a consistent way in the markup. And matching native appearance and behavior across a range of platforms may be impractical.<ref>http://lists.w3.org/Archives/Public/public-html/2010Jun/att-0659/issue-93-decision.html</ref> | ||
=== It isn | === It isn’t just about web browsers === | ||
Web browsers are not the only programs that use HTML. Sometimes elements and features are needed even when browsers won | Web browsers are not the only programs that use HTML. Sometimes elements and features are needed even when browsers won’t use them in any meaningful way. Document authoring tools, validators, search engines, screen readers, outliners, researchers, etc. all need and can use more information than a browser can. Furthermore if you provide more information than is currently used by browsers it opens up room for innovation. | ||
=== Experimenting with features === | === Experimenting with features === | ||
Line 30: | Line 56: | ||
=== Versioning the spec === | === Versioning the spec === | ||
Most authors don | Most authors don’t care about whether or not an implementation supports an entire, full specification; they just want to know “Can I use this feature in this browser?” So saying that all major implementations support much of CSS 2 to a high degree of correctness is useless for knowing if, say, the author can use display: run-in. In other words, the feature tables are what web authors would actually use in real life.<ref>http://www.mail-archive.com/[email protected]/msg23306.html</ref> | ||
<!--=== HTML5 the spec vs HTML5 the buzzword === | <!--=== HTML5 the spec vs HTML5 the buzzword === | ||
Line 36: | Line 62: | ||
--> | --> | ||
=== Modifying existing semantics === | === Modifying existing semantics === | ||
Some elements have different semantics than what HTML | Some elements have different semantics than what HTML users would expect. Semantic markup isn’t very useful if most pages use elements in a manner that conflicts with the defined semantics. For example, if a search engine treated <code>dd</code> as enclosing a term being defined, for the purposes of searching for definitions, it would not find many definitions, and it would misclassify things.<ref>http://lists.whatwg.org/htdig.cgi/help-whatwg.org/2010-October/000668.html</ref> | ||
=== What is the purpose of defining elements semantically? === | |||
Semantic definitions allow HTML processors, such as Web browsers or search engines, to present and use documents and applications in a wide variety of contexts that the author might not have considered.<ref>https://html.spec.whatwg.org//multipage/elements.html#elements</ref> | |||
Consider a Web page written by an author who only considered desktop computer Web browsers. Because HTML conveys <em>meaning</em>, rather than presentation, the same page can also be used by a small browser on a mobile phone, without any change to the page. Instead of headings being in large letters as on the desktop, for example, the browser on the mobile phone might use the same size text for the whole the page, but with the headings in bold. | |||
The same page could equally be used by a blind user using a browser based around speech synthesis, which instead of displaying the page on a screen, reads the page to the user, e.g., using headphones. Instead of large text for the headings, the speech browser might use a different volume or a slower voice. | |||
Since the browsers know which parts of the page are the headings, they can create a document outline that the user can use to quickly navigate around the document, using keys for “jump to next heading” or “jump to previous heading”. Such features are especially common with speech browsers, where users would otherwise find quickly navigating a page quite difficult. | |||
Even beyond browsers, software can make use of this information. Search engines can use the headings to more effectively index a page, or to provide quick links to subsections of the page from their results. Tools can use the headings to create a table of contents (that is in fact how the table of contents of the WHATWG HTML specification is generated). | |||
This example has focused on headings, but the same principle applies to all of the semantics in HTML. | |||
=== Why is it important to stick to the semantics as defined in the spec? === | |||
Not adhering to the spec’s semantics prevents software that assumes and relies on said semantics from correctly processing the document. | |||
For example, the following document is non-conforming, despite being syntactically correct: | |||
<pre><!DOCTYPE HTML> | |||
<html lang="en-GB"> | |||
<head> <title> Demonstration </title> </head> | |||
<body> | |||
<table> | |||
<tr> <td> My favourite animal is the cat. </td> </tr> | |||
<tr> | |||
<td> | |||
<a href="http://example.org/~ernest/"><cite>Ernest</cite></a>, | |||
in an essay from 1992 | |||
</td> | |||
</tr> | |||
</table> | |||
</body> | |||
</html></pre> | |||
…because the data placed in the cells is clearly not tabular data (and the <code>cite</code> element is misused). This would make software that relies on these semantics fail. For example, a speech browser that allowed a blind user to navigate tables in the document would report the quote above as a table, confusing the user; similarly, a tool that extracted titles of works from pages would extract “Ernest” as the title of a work, even though it’s actually a person’s name, not a title. | |||
A corrected version of this document might be: | |||
<pre><!DOCTYPE HTML> | |||
<html lang="en-GB"> | |||
<head> <title> Demonstration </title> </head> | |||
<body> | |||
<blockquote> | |||
<p> My favourite animal is the cat. </p> | |||
</blockquote> | |||
<p> | |||
—<a href="http://example.org/~ernest/">Ernest</a>, | |||
in an essay from 1992 | |||
</p> | |||
</body> | |||
</html></pre> | |||
== Specific Elements == | == Specific Elements == | ||
=== | === The DOCTYPE (Document Type Declaration) === | ||
Because HTML has moved to an unversioned model, the <code>DOCTYPE</code> does not a have version number. The inclusion of a document type declaration is necessary merely for legacy browsers that will operate in quirks mode (a non-spec compliant rendering mode) if a <code>DOCTYPE</code> is absent. | |||
=== Document metadata === | |||
==== The <code>charset</code> attribute on the <code>meta</code> element in XML documents ==== | |||
The <code>charset</code> attribute on the <code>meta</code> element has no effect in XML documents; it is only allowed in order to facilitate migration to and from XHTML.<ref>https://html.spec.whatwg.org//multipage/semantics.html#the-meta-element</ref> | |||
==== Inclusion of the <code>application-name</code> metadata <code>name</code> value ==== | |||
User agents may want to use the Web application name in UI in preference to the page’s <code>title</code>, as the title might include status messages and the like relevant to the status of the page at a particular moment in time instead of just being the name of the application.<ref>https://html.spec.whatwg.org//multipage/semantics.html#the-meta-element</ref> | |||
==== On the continued inclusion of the <code>keyword</code> metadata <code>name</code> value ==== | |||
Considering that the <code>keyword</code> value has historically been used unreliably and even misleadingly as a way to spam search engine results (i.e., to garner higher search engine rankings), why is this feature still included in the spec? Because a content management system, for example, can use the keyword information of pages within the system to populate the index of a site-specific search engine. In short, keywords have use beyond the large-scale content aggregators (e.g., Google) that pervade the Web. | |||
=== Sections === | === Sections === | ||
Line 53: | Line 142: | ||
=== Grouping content === | === Grouping content === | ||
==== <code>blockquote</code> ==== | ==== The <code>blockquote</code> element ==== | ||
===== Why is it non-conforming to place attributions and inline citations inside the <code>blockquote</code> element? ===== | |||
Because the specification does not consider attributions and inline citations to be part of a block quote proper.<ref>http://developers.whatwg.org/grouping-content.html#the-blockquote-element</ref> In other words, the <code>blockquote</code> element represents only the quote itself. | |||
=== Text-level semantics === | === Text-level semantics === | ||
Line 61: | Line 152: | ||
==== On the status of <code>image</code> ==== | ==== On the status of <code>image</code> ==== | ||
The <code>image</code> element is treated as an alternate (but invalid) name for <code>img</code>. This is because some sites (around 0.2%<ref>Email from Ian Hickson; comment in spec source</ref>) make this mistake. It is already treated as an image by most major browsers. | The <code>image</code> element is treated as an alternate (but invalid) name for <code>img</code>. This is because some sites (around 0.2%<ref>Email from Ian Hickson; comment in spec source</ref>) make this mistake. It is already treated as an image by most major browsers. | ||
Line 72: | Line 162: | ||
==== <code>textarea</code> ==== | ==== <code>textarea</code> ==== | ||
The text area defaults to soft wrapping of the text area. The attribute @wrap can have one of the following values: soft, hard, or off.<ref> | The text area defaults to soft wrapping of the text area. The attribute @wrap can have one of the following values: soft, hard, or off.<ref>https://html.spec.whatwg.org//#the-textarea-element-0</ref>. "off" is considered a non-conforming value because it appears to have no purpose other than a visual presentational effect. <ref>http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-August/022022.html</ref><ref>http://www.mail-archive.com/[email protected]/msg22660.html</ref> | ||
==== <code>meter</code> and <code>progress</code> (are not the same thing) ==== | ==== <code>meter</code> and <code>progress</code> (are not the same thing) ==== | ||
Line 94: | Line 184: | ||
=== script element === | === script element === | ||
Why the [ | Why the [https://html.spec.whatwg.org//multipage/scripting-1.html#restrictions-for-contents-of-script-elements restrictions for contents of script elements]? Why the [https://html.spec.whatwg.org//multipage/tokenization.html#script-data-double-escaped-dash-dash-state complicated parsing rules for script elements]? | ||
See http://lists.w3.org/Archives/Public/public-html-comments/2010Mar/0017.html | See http://lists.w3.org/Archives/Public/public-html-comments/2010Mar/0017.html | ||
==== @ | ==== @defer and @async ==== | ||
async tells the browsers to run the script with its following content at the '''same''' time(namely, asynchronously). | |||
defer tells the browsers to run the script '''later''', and to run the following content first(the browsers will run the script until the page is ready).<ref>http://www.mail-archive.com/[email protected]/msg22436.html</ref> | |||
The HTML parser has [ | === Quirks mode === | ||
The HTML parser has [https://html.spec.whatwg.org//multipage/tokenization.html#parsing-main-inbody the following] behavior difference in quirks mode: | |||
<blockquote><dl><dt>A start tag whose tag name is "table" | <blockquote><dl><dt>A start tag whose tag name is "table" | ||
Line 111: | Line 201: | ||
Why? See http://hsivonen.iki.fi/last-html-quirk/ | Why? See http://hsivonen.iki.fi/last-html-quirk/ | ||
=== | === Ignored white space before head === | ||
White space before the <code><head></code> tag is ignored. The main reason is that, given the markup | White space before the <code><head></code> tag is ignored. The main reason is that, given the markup | ||
Line 142: | Line 231: | ||
== Rejected proposals == | == Rejected proposals == | ||
=== | A “<code><comment></code>” element for marking up user comments (i.e., user compositions in response to newspaper or magazine articles, blog entries, discussion topics, status updates, images, videos, etc.) === | ||
There is | |||
==== Why isn’t there an element for user comments? (e.g., <code><comment></code>) ==== | |||
There is: <code>article</code>. | |||
==== But comments are not articles ==== | |||
Unfortunately, it is basically impossible for a single word or letter to stand for a careful description of an element’s semantics, and the element name “article” isn’t intended to carry the same meaning as its corresponding dictionary entry or any colloquial understanding of the term.<ref>http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Jan/0226.html</ref> The term “article” is defined broadly in HTML to include any complete or self-contained composition. This includes: | |||
* forum posts | |||
* newspaper articles | |||
* magazine articles | |||
* books | |||
* blog posts | |||
* comment on a forum post | |||
* comment on a newspaper article | |||
* comment on a magazine article | |||
* comment on a blog post | |||
* an embeddable interactive widget | |||
* a post with a photograph on a social network | |||
* a comment on a photograph on a social network | |||
* a specification | |||
* an e-mail | |||
* a reply to an e-mail | |||
Comments are considered articles—in the HTML sense—because they are <em>complete</em> compositions unto themselves i.e., they are not part of the piece of writing that they are commenting on (though they are obviously <em>related</em> to what they are commenting on, for example, “is in response to.” This relationship is demonstrated by nesting the comment article inside the article it’s responding to). | |||
==== Surely the comment “LOL” is not an article? ==== | |||
According to the HTML spec, it is. It’s true that many comments need a context to be appreciated or fully understood, but then again many (most?, all?) newspaper or magazine articles need some greater context to be fully understood as well. The point is that the definition of <code>article</code> does not require a piece of writing to be fully intelligible on its own. All that matters is that the comment is separate from the thing that it’s commenting on. It’s that separateness that makes it an article (in the HTML sense). | |||
==== Robots and plugins can extract comments from web pages more easily if they have their own element. Comments can then be more easily syndicated, displayed, hidden, styled, etc. ==== | |||
There’s no compelling argument that a dedicated <code>comment</code> element would make this meaningfully easier than nested <code>article</code> elements. | |||
==== Comments can sometimes appear in a different region of the page than the composition they are referencing. By defining comments as nested articles, the spec is artificially forcing comments to be contained within the markup of the composition they are referencing. ==== | |||
No evidence has been put forth to suggest that this is a significant authorship issue. | |||
==== Why should the spec suggest any one specific method for marking up comments? ==== | |||
If it is clear that an <code>article</code> within an <code>article</code> represents a comment, one can easily: | |||
* programmatically find comments in HTML | |||
* write interoperable style sheets for comments, using the selector <code>article > article</code> | |||
* use HTML fragments in a document store for content management (e.g., blog software with a git backend) | |||
Without having one interoperable way of expressing comments, all that becomes a lot harder.<ref>http://lists.w3.org/Archives/Public/public-whatwg-archive/2013Feb/0129.html</ref> | |||
Also see [[Rationale#Why is it important to stick to the semantics as defined in the spec?|Why is it important to stick to the semantics as defined in the spec?]] on this page. | |||
=== Why isn’t there a dedicated element for advertisements? (e.g., <code><ad></code>, or <code><advert></code>, or <code><banner></code>, or whatever) === | |||
Because it would give users a relatively easy method for hiding or otherwise disabling ads, in which case the element would very likely end up not being used by content authors<ref>http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-February/013939.html</ref><ref>http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-September/033086.html</ref>: | |||
< | <pre>> How should advertisements be marked up? | ||
It's worth considering that an <advert> element (or <banner> or whatever | |||
you decide to call it) would just cause style rules like advert | |||
{display:none;} to become widespread (e.g. by integration into Adblock | |||
and equivalent). Therefore I can't see this type of markup being used by | |||
most advertisers.</pre> | |||
< | <pre>> I've joined this list to put forward the argument that there should be | ||
> elements for <comment> and <ad> included in the HTML5 spec. | |||
> | |||
> These are both extremely common features of many web pages; I would say | |||
> at least as common as "article". | |||
< | For <ad>, there's the obvious potential usage of setting | ||
ad { display: none !important } | |||
< | in a user style sheet. I don't think this possibility would make <ad> | ||
popular among authors.</pre> | |||
< | Ian Hickson recommends using the <code>aside</code> element instead<ref>http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Jan/0226.html</ref>: | ||
< | <pre>> I've joined this list to put forward the argument that there should be | ||
> elements for <comment> and <ad> included in the HTML5 spec. | |||
For advertisments, I do not think it makes sense to add an element. In | |||
practice, it would likely not end up being used, since doing so would make | |||
it too easy to hide advertisments. | |||
< | However, the <aside> element is a close fit for the semantic, so I would | ||
recommend using that.</pre> | |||
< | === Why isn’t there a grouping-type element for description lists to represent individual name-value groups (e.g., a “<code>dli</code>” element)? It would make styling as well as adding microdata to individual groups much easier.<ref>http://lists.whatwg.org/htdig.cgi/help-whatwg.org/2013-October/001245.html</ref> === | ||
There is; it is now allowed to use <code><div></code> in <code><dl></code>. See https://github.com/whatwg/html/issues/1937 | |||
There is | |||
=== <code>sandbox</code> attribute on the <code>html</code> element === | === Why isn’t there a <code>sandbox</code> attribute on the <code>html</code> element? === | ||
HTML is the wrong level for disabling scripts or other features. This is the kind of thing | HTML is the wrong level for disabling scripts or other features. This is the kind of thing should be done at the HTTP layer.<ref>http://www.w3.org/Bugs/Public/show_bug.cgi?id=8849</ref><ref>https://wiki.mozilla.org/Security/CSP</ref> | ||
=== | === Feature queries === | ||
Various proposals have come up with the idea of being able to determine if a certain feature is available.<ref>http://lists.w3.org/Archives/Public/www-style/2009Dec/0130.html</ref> These fail for a variety of reasons: | Various proposals have come up with the idea of being able to determine if a certain feature is available.<ref>http://lists.w3.org/Archives/Public/www-style/2009Dec/0130.html</ref> These fail for a variety of reasons: | ||
Part of the problem is that browser vendors will be economical with the truth. Marketing people always have an over-optimistic view of the compliance of their product, and will always give themselves the benefit of the doubt in borderline cases. Also, changing the compliance statement, to remove false claims that are exposed, is likely to a very low priority for the developers.<ref>http://lists.w3.org/Archives/Public/www-style/2010Jul/0097.html</ref> | Part of the problem is that browser vendors will be economical with the truth. Marketing people always have an over-optimistic view of the compliance of their product, and will always give themselves the benefit of the doubt in borderline cases. Also, changing the compliance statement, to remove false claims that are exposed, is likely to a very low priority for the developers.<ref>http://lists.w3.org/Archives/Public/www-style/2010Jul/0097.html</ref> | ||
Line 181: | Line 325: | ||
Some other reasons can be found in the footnotes.<ref>http://lists.w3.org/Archives/Public/www-style/2003Oct/0074.html</ref><ref>http://lists.w3.org/Archives/Public/www-style/2004Mar/0282.html</ref> | Some other reasons can be found in the footnotes.<ref>http://lists.w3.org/Archives/Public/www-style/2003Oct/0074.html</ref><ref>http://lists.w3.org/Archives/Public/www-style/2004Mar/0282.html</ref> | ||
=== custom HTML elements === | === Why aren’t authors allowed to make custom HTML elements? === | ||
It is now allowed, see https://html.spec.whatwg.org/multipage/custom-elements.html#custom-elements | |||
Be aware however that using custom elements when standard elements could have been used make it impossible for search engines, developers, and browsers to understand the semantics of a page.<ref>http://html5doctor.com/your-questions-13/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+html5doctor+%28HTML5doctor%29</ref> | |||
<!-- | <!-- |
Latest revision as of 19:03, 20 October 2019
This rationale document is supplemental to the WHATWG HTML Living Standard specification. It is a work-in-progress.
General Rationale
In overall terms, what determines what’s in the spec?
Already-implemented features
In the past, the contents of the spec tended to be determined by browsers’ de facto feature set. After all, one of the main purposes of the spec is to describe reality (regardless of what the spec editor, members, and contributors think). Historically, vendors competed with one another by implementing features without regard to any specification. It’s a relatively recent phenomenon that vendors compete by their conformance to the spec. That is to say, in the past, browsers tended to dictate what was in the spec, whereas today, it's the converse.
Thus, the spec must include all already-implemented features. A feature may be moved to the obsolete section if there is consensus that the feature should not be used in new content.
Member and contributor feedback
When adding features to the spec, the editor takes into account member and contributor feedback (via the WHATWG mailing list and IRC channel). If the editor believes inclusion of a feature is justified, he will add it to the spec. Ultimately, whether a feature remains in the spec depends on whether or not vendors (at least two) decide to implement the feature.
One vendor, one veto
Part of the the goal of the WHATWG is to document how web browsers actually handle HTML. As such, browser vendors already have veto power—by not following the standard. The W3C and WHATWG do not have any enforcement power and can only write what browsers are willing to implement. Not removing features from the HTML standard that at least one browser vendor has stated they are unwilling to implement causes the HTML spec to not accurately document reality.[1][2] The veto isn’t a power that we grant browsers; it’s a right that they earn on their own by virtue of having users. The minimum market share for a veto is somewhere around 1%.[3]
Why is everything else around us developing so fast, but the web is so slow to adopt anything?[4]
Because to get something adopted in a browser, you need to do the following (not always in this order):
- Have someone design the feature.
- Have someone write a specification for it.
- Have some people write tests for it.
- Have one browser implement it.
- Have another browser implement it.
- Have another browser implement it.
- Have another browser implement it.
- Have people document it.
This is in contrast to “everything else,” which just needs:
- Someone to implement it.
Using elements where scripts “work”
In addition, arguments were made that JavaScript-based implementations of details suffer from problems and limitations. Scripting behavior may be inconsistent across browsers, or even unavailable in some contexts. Accessibility is "bolted on", allowing more opportunity for author error, even when using libraries. The data model is not exposed in a consistent way in the markup. And matching native appearance and behavior across a range of platforms may be impractical.[5]
It isn’t just about web browsers
Web browsers are not the only programs that use HTML. Sometimes elements and features are needed even when browsers won’t use them in any meaningful way. Document authoring tools, validators, search engines, screen readers, outliners, researchers, etc. all need and can use more information than a browser can. Furthermore if you provide more information than is currently used by browsers it opens up room for innovation.
Experimenting with features
New unknown and untested features are unlikely to get accepted into the WHATWG spec. Browsers and browser extensions (like Google Gears) are expected to first establish use cases and implementation possibilities before the spec is changed. [6]
Versioning the spec
Most authors don’t care about whether or not an implementation supports an entire, full specification; they just want to know “Can I use this feature in this browser?” So saying that all major implementations support much of CSS 2 to a high degree of correctness is useless for knowing if, say, the author can use display: run-in. In other words, the feature tables are what web authors would actually use in real life.[7]
Modifying existing semantics
Some elements have different semantics than what HTML users would expect. Semantic markup isn’t very useful if most pages use elements in a manner that conflicts with the defined semantics. For example, if a search engine treated dd
as enclosing a term being defined, for the purposes of searching for definitions, it would not find many definitions, and it would misclassify things.[8]
What is the purpose of defining elements semantically?
Semantic definitions allow HTML processors, such as Web browsers or search engines, to present and use documents and applications in a wide variety of contexts that the author might not have considered.[9]
Consider a Web page written by an author who only considered desktop computer Web browsers. Because HTML conveys meaning, rather than presentation, the same page can also be used by a small browser on a mobile phone, without any change to the page. Instead of headings being in large letters as on the desktop, for example, the browser on the mobile phone might use the same size text for the whole the page, but with the headings in bold.
The same page could equally be used by a blind user using a browser based around speech synthesis, which instead of displaying the page on a screen, reads the page to the user, e.g., using headphones. Instead of large text for the headings, the speech browser might use a different volume or a slower voice.
Since the browsers know which parts of the page are the headings, they can create a document outline that the user can use to quickly navigate around the document, using keys for “jump to next heading” or “jump to previous heading”. Such features are especially common with speech browsers, where users would otherwise find quickly navigating a page quite difficult.
Even beyond browsers, software can make use of this information. Search engines can use the headings to more effectively index a page, or to provide quick links to subsections of the page from their results. Tools can use the headings to create a table of contents (that is in fact how the table of contents of the WHATWG HTML specification is generated).
This example has focused on headings, but the same principle applies to all of the semantics in HTML.
Why is it important to stick to the semantics as defined in the spec?
Not adhering to the spec’s semantics prevents software that assumes and relies on said semantics from correctly processing the document.
For example, the following document is non-conforming, despite being syntactically correct:
<!DOCTYPE HTML> <html lang="en-GB"> <head> <title> Demonstration </title> </head> <body> <table> <tr> <td> My favourite animal is the cat. </td> </tr> <tr> <td> <a href="http://example.org/~ernest/"><cite>Ernest</cite></a>, in an essay from 1992 </td> </tr> </table> </body> </html>
…because the data placed in the cells is clearly not tabular data (and the cite
element is misused). This would make software that relies on these semantics fail. For example, a speech browser that allowed a blind user to navigate tables in the document would report the quote above as a table, confusing the user; similarly, a tool that extracted titles of works from pages would extract “Ernest” as the title of a work, even though it’s actually a person’s name, not a title.
A corrected version of this document might be:
<!DOCTYPE HTML> <html lang="en-GB"> <head> <title> Demonstration </title> </head> <body> <blockquote> <p> My favourite animal is the cat. </p> </blockquote> <p> —<a href="http://example.org/~ernest/">Ernest</a>, in an essay from 1992 </p> </body> </html>
Specific Elements
The DOCTYPE (Document Type Declaration)
Because HTML has moved to an unversioned model, the DOCTYPE
does not a have version number. The inclusion of a document type declaration is necessary merely for legacy browsers that will operate in quirks mode (a non-spec compliant rendering mode) if a DOCTYPE
is absent.
Document metadata
The charset
attribute on the meta
element in XML documents
The charset
attribute on the meta
element has no effect in XML documents; it is only allowed in order to facilitate migration to and from XHTML.[10]
Inclusion of the application-name
metadata name
value
User agents may want to use the Web application name in UI in preference to the page’s title
, as the title might include status messages and the like relevant to the status of the page at a particular moment in time instead of just being the name of the application.[11]
On the continued inclusion of the keyword
metadata name
value
Considering that the keyword
value has historically been used unreliably and even misleadingly as a way to spam search engine results (i.e., to garner higher search engine rankings), why is this feature still included in the spec? Because a content management system, for example, can use the keyword information of pages within the system to populate the index of a site-specific search engine. In short, keywords have use beyond the large-scale content aggregators (e.g., Google) that pervade the Web.
Sections
hgroup
and other heading elements
The point of hgroup
is to hide the subtitle from the outlining algorithm.
The primary purpose of these elements is merely to help the author write self-explanatory markup that is easy to maintain and style; they are not intended to impose specific structures on authors.[12]
Grouping content
The blockquote
element
Why is it non-conforming to place attributions and inline citations inside the blockquote
element?
Because the specification does not consider attributions and inline citations to be part of a block quote proper.[13] In other words, the blockquote
element represents only the quote itself.
Text-level semantics
Embedded content
On the status of image
The image
element is treated as an alternate (but invalid) name for img
. This is because some sites (around 0.2%[14]) make this mistake. It is already treated as an image by most major browsers.
The img
element and alternate (alt
) text
On certain types of pages adding alternate text (with the alt
attribute) is impossible (e.g., sites where the user can upload but with no mechanism to supply a description). Because of this, the alt
attribute is optional. [15][16][17]
A longdesc attribute is not needed [18]
Forms
textarea
The text area defaults to soft wrapping of the text area. The attribute @wrap can have one of the following values: soft, hard, or off.[19]. "off" is considered a non-conforming value because it appears to have no purpose other than a visual presentational effect. [20][21]
meter
and progress
(are not the same thing)
meter
is not just a special case of progress
. The meter
element represents a scalar measurement within a known range, such as storage quota usage, a relative popularity rating or relevance indicator. The control allows for the indication of high and low ranges, or minimum, maximum and optimal levels.
The progress
element, on the other hand, represents the completion progress of a task. This could be a real time indicator for background processing task (e.g. using Web Workers or a file upload). progress
elements can also be in the indeterminate state, indicating that something is in progress, but its completion progress is unknown.[22]
See Re: <progress> draft for details.
Interactive elements
details
element
The details
element is needed to provide an accessible way of reflecting a
common application widget in HTML-based applications without requiring authors
to use extensive scripting, ARIA, and platform-specific CSS to get the same
effect.[23][24]
HTML parsing
script element
Why the restrictions for contents of script elements? Why the complicated parsing rules for script elements?
See http://lists.w3.org/Archives/Public/public-html-comments/2010Mar/0017.html
@defer and @async
async tells the browsers to run the script with its following content at the same time(namely, asynchronously).
defer tells the browsers to run the script later, and to run the following content first(the browsers will run the script until the page is ready).[25]
Quirks mode
The HTML parser has the following behavior difference in quirks mode:
- A start tag whose tag name is "table"
- If the Document is not set to quirks mode, and the stack of open elements has a p element in scope, then act as if an end tag with the tag name "p" had been seen.
Why? See http://hsivonen.iki.fi/last-html-quirk/
Ignored white space before head
White space before the <head>
tag is ignored. The main reason is that, given the markup
<!DOCTYPE html> <html> <head> <title>Sample page</title> ...,
some people expect
document.documentElement.firstChild
to return the head
element.[26]
Rejected proposals
A “<comment>
” element for marking up user comments (i.e., user compositions in response to newspaper or magazine articles, blog entries, discussion topics, status updates, images, videos, etc.) ===
Why isn’t there an element for user comments? (e.g., <comment>
)
There is: article
.
But comments are not articles
Unfortunately, it is basically impossible for a single word or letter to stand for a careful description of an element’s semantics, and the element name “article” isn’t intended to carry the same meaning as its corresponding dictionary entry or any colloquial understanding of the term.[27] The term “article” is defined broadly in HTML to include any complete or self-contained composition. This includes:
- forum posts
- newspaper articles
- magazine articles
- books
- blog posts
- comment on a forum post
- comment on a newspaper article
- comment on a magazine article
- comment on a blog post
- an embeddable interactive widget
- a post with a photograph on a social network
- a comment on a photograph on a social network
- a specification
- an e-mail
- a reply to an e-mail
Comments are considered articles—in the HTML sense—because they are complete compositions unto themselves i.e., they are not part of the piece of writing that they are commenting on (though they are obviously related to what they are commenting on, for example, “is in response to.” This relationship is demonstrated by nesting the comment article inside the article it’s responding to).
Surely the comment “LOL” is not an article?
According to the HTML spec, it is. It’s true that many comments need a context to be appreciated or fully understood, but then again many (most?, all?) newspaper or magazine articles need some greater context to be fully understood as well. The point is that the definition of article
does not require a piece of writing to be fully intelligible on its own. All that matters is that the comment is separate from the thing that it’s commenting on. It’s that separateness that makes it an article (in the HTML sense).
There’s no compelling argument that a dedicated comment
element would make this meaningfully easier than nested article
elements.
Comments can sometimes appear in a different region of the page than the composition they are referencing. By defining comments as nested articles, the spec is artificially forcing comments to be contained within the markup of the composition they are referencing.
No evidence has been put forth to suggest that this is a significant authorship issue.
Why should the spec suggest any one specific method for marking up comments?
If it is clear that an article
within an article
represents a comment, one can easily:
- programmatically find comments in HTML
- write interoperable style sheets for comments, using the selector
article > article
- use HTML fragments in a document store for content management (e.g., blog software with a git backend)
Without having one interoperable way of expressing comments, all that becomes a lot harder.[28]
Also see Why is it important to stick to the semantics as defined in the spec? on this page.
Why isn’t there a dedicated element for advertisements? (e.g., <ad>
, or <advert>
, or <banner>
, or whatever)
Because it would give users a relatively easy method for hiding or otherwise disabling ads, in which case the element would very likely end up not being used by content authors[29][30]:
> How should advertisements be marked up? It's worth considering that an <advert> element (or <banner> or whatever you decide to call it) would just cause style rules like advert {display:none;} to become widespread (e.g. by integration into Adblock and equivalent). Therefore I can't see this type of markup being used by most advertisers.
> I've joined this list to put forward the argument that there should be > elements for <comment> and <ad> included in the HTML5 spec. > > These are both extremely common features of many web pages; I would say > at least as common as "article". For <ad>, there's the obvious potential usage of setting ad { display: none !important } in a user style sheet. I don't think this possibility would make <ad> popular among authors.
Ian Hickson recommends using the aside
element instead[31]:
> I've joined this list to put forward the argument that there should be > elements for <comment> and <ad> included in the HTML5 spec. For advertisments, I do not think it makes sense to add an element. In practice, it would likely not end up being used, since doing so would make it too easy to hide advertisments. However, the <aside> element is a close fit for the semantic, so I would recommend using that.
Why isn’t there a grouping-type element for description lists to represent individual name-value groups (e.g., a “dli
” element)? It would make styling as well as adding microdata to individual groups much easier.[32]
There is; it is now allowed to use <div>
in <dl>
. See https://github.com/whatwg/html/issues/1937
Why isn’t there a sandbox
attribute on the html
element?
HTML is the wrong level for disabling scripts or other features. This is the kind of thing should be done at the HTTP layer.[33][34]
Feature queries
Various proposals have come up with the idea of being able to determine if a certain feature is available.[35] These fail for a variety of reasons: Part of the problem is that browser vendors will be economical with the truth. Marketing people always have an over-optimistic view of the compliance of their product, and will always give themselves the benefit of the doubt in borderline cases. Also, changing the compliance statement, to remove false claims that are exposed, is likely to a very low priority for the developers.[36] With regard to CSS feature compliance: Remember that CSS provides hints and implementations don't have to accept those hints, and hardware may sometimes prevent their being implemented.[37] Some other reasons can be found in the footnotes.[38][39]
Why aren’t authors allowed to make custom HTML elements?
It is now allowed, see https://html.spec.whatwg.org/multipage/custom-elements.html#custom-elements
Be aware however that using custom elements when standard elements could have been used make it impossible for search engines, developers, and browsers to understand the semantics of a page.[40]
Other Pages
- HTML Design Principles
- Why no namespaces
- Why no script implements
- Why not reuse legend or another mini-header element.
- XHTML2 versus HTML5
- <meta http-equiv=content-language>
- earlier page started with the same purpose.
- rationale for some new HTML5 elements
- WHATWG FAQ
References
- ↑ http://lists.w3.org/Archives/Public/public-html/2009Jul/0257.html -- Re: Codecs for <video> and <audio></a>
- ↑ http://lists.w3.org/Archives/Public/www-archive/2009Jul/0075.html --Formal Objection to One vendor, One Veto
- ↑ http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-June/026897.html
- ↑ http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2013-October/041037.html
- ↑ http://lists.w3.org/Archives/Public/public-html/2010Jun/att-0659/issue-93-decision.html
- ↑ http://www.mail-archive.com/[email protected]/msg22577.html
- ↑ http://www.mail-archive.com/[email protected]/msg23306.html
- ↑ http://lists.whatwg.org/htdig.cgi/help-whatwg.org/2010-October/000668.html
- ↑ https://html.spec.whatwg.org//multipage/elements.html#elements
- ↑ https://html.spec.whatwg.org//multipage/semantics.html#the-meta-element
- ↑ https://html.spec.whatwg.org//multipage/semantics.html#the-meta-element
- ↑ http://developers.whatwg.org/sections.html#the-footer-element
- ↑ http://developers.whatwg.org/grouping-content.html#the-blockquote-element
- ↑ Email from Ian Hickson; comment in spec source
- ↑ http://www.paciellogroup.com/resources/articles/altinhtml5.html
- ↑ http://juicystudio.com/article/requiring-alt-attribute-html5.php
- ↑ http://lists.w3.org/Archives/Public/public-html/2007Jun/0393.html
- ↑ http://juicystudio.com/article/html5-image-element-no-alt.php
- ↑ https://html.spec.whatwg.org//#the-textarea-element-0
- ↑ http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-August/022022.html
- ↑ http://www.mail-archive.com/[email protected]/msg22660.html
- ↑ http://html5doctor.com/your-questions-answered-11/
- ↑ http://www.w3.org/Bugs/Public/show_bug.cgi?id=8379#c13
- ↑ http://www.w3.org/html/wg/wiki/ChangeProposals/removedetails
- ↑ http://www.mail-archive.com/[email protected]/msg22436.html
- ↑ [whatwg] several messages about the tree construction stage of HTML parsing
- ↑ http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Jan/0226.html
- ↑ http://lists.w3.org/Archives/Public/public-whatwg-archive/2013Feb/0129.html
- ↑ http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-February/013939.html
- ↑ http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-September/033086.html
- ↑ http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Jan/0226.html
- ↑ http://lists.whatwg.org/htdig.cgi/help-whatwg.org/2013-October/001245.html
- ↑ http://www.w3.org/Bugs/Public/show_bug.cgi?id=8849
- ↑ https://wiki.mozilla.org/Security/CSP
- ↑ http://lists.w3.org/Archives/Public/www-style/2009Dec/0130.html
- ↑ http://lists.w3.org/Archives/Public/www-style/2010Jul/0097.html
- ↑ http://lists.w3.org/Archives/Public/www-style/2003Nov/0000.html
- ↑ http://lists.w3.org/Archives/Public/www-style/2003Oct/0074.html
- ↑ http://lists.w3.org/Archives/Public/www-style/2004Mar/0282.html
- ↑ http://html5doctor.com/your-questions-13/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+html5doctor+%28HTML5doctor%29