A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

FAQ

From WHATWG Wiki
Revision as of 18:27, 16 January 2009 by Annevk (talk | contribs) (→‎What is an HTML Serialisation?: be gentle to XML)
Jump to navigation Jump to search

The WHATWG

What is the WHATWG?

The WHATWG is a growing community of people interested in evolving the Web. It focuses primarily on the development of HTML and APIs needed for Web applications.

The WHATWG was founded by individuals of Apple, the Mozilla Foundation, and Opera Software in 2004, after a W3C workshop. Apple, Mozilla and Opera were becoming increasingly concerned about the W3C’s direction with XHTML, lack of interest in HTML and apparent disregard for the needs of real-world authors. So, in response, these organisations set out with a mission to address these concerns and the Web Hypertext Application Technology Working Group was born.

What does the acronym WHATWG stand for?

It stands for "Web Hypertext Application Technology Working Group".

What is the WHATWG working on?

The WHATWG is working on HTML 5 (see below). In the past it has worked on Web Forms 2.0 and Web Controls 1.0 as well. Web Forms 2.0 (see below) has reached a stable stage and we're awaiting implementation experience. Web Controls 1.0 has been abandoned for now, awaiting what XBL 2.0 will bring us.

How can I get involved?

There are lots of ways you can get involved, take a look and see What you can do!

Is participation free?

Yes, everyone can contribute. There are no memberships fees involved, it's an open process. You may easily subscribe to the WHATWG mailing lists. You may also join the the W3C's new HTMLWG by going through the slightly longer application process.

The WHATWG Process

How does the WHATWG work?

People send e-mail to the mailing list. The editor then reads that feedback and, taking it into account along with research, studies, and feedback from many other sources (blogs, forums, IRC, etc) makes language design decisions intended to address everyone's needs as well as possible while keeping the language consistent.

This continues, with people sending more feedback, until nobody is able to convince the editor to change the spec any more (e.g. because two people want opposite things, and the editor has considered all the information available and decided that one of the two proposals is the better one).

This is not a consensus-based approach -- there's no guarantee that everyone will be happy! There is also no voting.

There is a small oversight committee (known as the "WHATWG members", see the charter) who have the authority to override or replace the editor if he starts making bad decisions.

Currently the editor is Ian Hickson.

How should browser developers interact with the WHATWG?

Feedback on a feature should be sent to [email protected] (but you have to join the mailing list first), or [email protected]. All feedback will receive a reply in due course.

If you want feedback to be dealt with faster than "eventually", e.g. because you are about to work on that feature and need the spec to be updated to take into account all previous feedback, let the editor know by either e-mailing him ([email protected]), or contacting him on IRC (Hixie on Freenode). Requests for priority feedback handling are handled confidentially so other browser vendors won't know that you are working on that feature.

Questions and requests for clarifications should be asked either on the mailing list or on IRC, in the #whatwg channel on Freenode.

Note: Browser developers are also referred to as UA developers or implementors. UA stands for User Agent and is basically the technical term for "browser".

When will HTML 5 be finished?

"Finished" is a big deal... You'll be able to use HTML5 long before then. See When will we be able to start using these new features?

It is estimated by the editor that HTML5 will reach the W3C Candidate Recommendation stage during 2012. That doesn't mean you can't start using it yet, though. Different parts of the specification are at different maturity levels. Some sections are already relatively stable and there are implementations that are already quite close to completion, and those features can be used today (e.g. <canvas>). But other sections are still being actively worked on and changed regularly, or not even written yet.

You can see annotations in the margins showing the estimated stability of each section.

The possible states are:

  • Idea; yet to be specified -- the section is a placeholder.
  • First draft -- An early stage.
  • Working draft -- An early stage, but more mature than just "first draft".
  • Last call for comments -- The section is nearly done, but there may be feedback still to be processed. Send feedback sooner rather than later, or it might be too late.
  • Awaiting implementation feedback -- The section is basically done, but might change in response to feedback from implementors. Major changes are unlikely past this point unless it is found that the feature, as specified, really doesn't work well.
  • Implemented and widely deployed -- the feature is specified and complete. Once a section is interoperably implemented, it’s quite stable and unlikely to change significantly. Any changes to such a section would most likely only be editorial in nature, particularly if the feature is already in widespread use.

There are also two special states:

  • Being edited right now -- the section is in high flux and is actively being edited. Contact Hixie on IRC if you have immediate feedback.
  • Being considered for removal -- for one reason or another, the section is being considered for removal. Send feedback soon to help with the decision.

The point to all this is that you shouldn’t place too much weight on the status of the specification as a whole. You need to consider the stability and maturity level of each section individually.

It is estimated, again by the editor, that HTML5 will reach a W3C recommendation in the year 2022 or later. This will be approximately 18-20 years of development, since beginning in mid-2004. That's actually not that crazy, though. Work on HTML4 started in the mid 90s, and HTML4 still, more than ten years later, hasn't reached the level that we want to reach with HTML5. There is no real test suite, there are many parts of the spec that are lacking real implementations, there are big parts that aren't interoperable, and the spec has hundreds if not thousands of known errors that haven't been fixed. When HTML4 came out, REC meant something much less exciting than it does now.

For a spec to become a REC today, it requires two 100% complete and fully interoperable implementations, which is proven by each successfully passing literally thousands of test cases (20,000 tests for the whole spec would probably be a conservative estimate). When you consider how long it takes to write that many test cases and how long it takes to implement each feature, you’ll begin to understand why the time frame seems so long.

(In the interests of full disclosure, the W3C's official line is that the HTML5 spec will be complete, with interoperable implementations, in late 2010. However, that same timetable gave a date for First Public Working Draft that was eight months premature, and the W3C, as of the predicted date for the third milestone, Candidate Recommendation, had still not come anywhere near reaching the second milestone, Last Call. You can make your own judgements regarding the W3C timetable's credibility.)

Is there a process for removing bad ideas from the spec?

There are several processes by which we trim weeds from the specifications.

  • On a regular basis, especially around explicit call-for-comments, we go through every section and mark areas as being considered for removal. This happened early in 2008 with the data templates, repetition blocks, and DFN-element cross references, for example. If no feedback is received to give us strong reasons to keep such features, then they eventually are removed altogether.
  • Anyone can ask for a feature to be removed; such feedback is considered like all other feedback and is based on the merits of the arguments put forward.
  • If browsers don't widely implement a feature, or if authors don't use a feature, or if the uses of the feature are inconsequential of fundamentally wrong or damaging, then, after due consideration, features will be removed.

Removing features is a critical part of spec development.

Is there a process for adding new features to the spec?

The process is rather informal, but basically boils down to this:

  1. Research the use cases and requirements by discussing the issue with authors and implementors.
  2. Come up with a clear description of the problem that needs to be solved.
  3. Discuss your proposal with authors and implementors. Read the responses. Listen to the feedback. Consider whether your ideas are good solutions to the use cases and requirements put forward. Discussions here should be done in public, e.g. on an archived public mailing list or documented in blogs.
  4. Get implementors to commit to implementing the feature. If you can't get several implementors to implement the feature, then get at least one user agent to implement it experimentally. Experimental implementations should be publicly available.
  5. Bring the experimental implementations to the attention of the spec's editor. Document the experience found from any implementations, the use cases and requirements that were found in the first step, the data that the design was based on.
  6. Demonstrate the importance of the problem. Demonstrate that the solution is one that will be used correctly and widely enough for it to solve the stated problem.
  7. Participate in the subsequent design discussions, considering all the proposals carefully. Typically at this step the original design gets thrown out and a significantly better design is developed, informed by the previous research, new research, and implementation and author experience with experimental implementations. Sometimes, the idea is abandoned at this stage.

If the idea survives the above design process, the spec will be eventually updated to reflect the new design. Implementations will then be updated to reflect the new design (if they aren't, that indicates the new design is not good, and it will be reworked or removed). The spec will be updated to fix the many problems discovered by authors and implementors, over a period of several years, as more authors and implementors are exposed to the design. Eventually, a number of provably interoperable implementations are deployed. At this point development of the feature is somewhat frozen.

Writing a comprehensive test suite is also an important step, which should start a bit before implementations start being written to the spec. (Test suites usually find as many problems with implementations as they do with the spec; they aren't just for finding browser bugs.) We don't yet have a good story with respect to test suites, sadly. If you want to help us out, let the mailing list know! Be aware, though, it's a lot of work.

HTML5

What is HTML 5?

HTML 5 is the main focus of the WHATWG community and also that of the (new) W3C HTML Working Group. HTML 5 is a new version of HTML 4.01 and XHTML 1.0 addressing many of the issues of those specifications while at the same time enhancing (X)HTML to more adequately address Web applications. Besides defining a markup language that can be written in both HTML (HTML5) and XML (XHTML5) it also defines many APIs that form the basis of the Web architecture. These APIs are known to some as "DOM Level 0" and have never been documented. Yet they are extremely important for browser vendors to support existing Web content and for authors to be able to build Web applications.

What is Web Forms 2.0?

Web Forms 2.0 is an update to the forms chapters of HTML 4.01 and XHTML 1.0. The specification is informally declared feature complete and the WHATWG is awaiting implementation experience. This specification will in due course be folded into the HTML 5 specification when HTML 5 reaches a more stable state.

How can I keep track of changes to the spec?

The specification is available in the subversion repository. You may use any svn client to check out the latest version and use your clients diff tools in order compare revisions and see what has been changed. You may also use the online (X)HTML5 Tracker Tool. The tool provides an online interface for selecting and comparing revisions of the spec.

When will we be able to start using these new features?

As soon as browsers begin to support them. You do not need to wait till HTML5 becomes a recommendation, because that can’t happen until after the implementations are completely finished.

For example, the <canvas> feature is already widely implemented.

The specification has annotations in the margins showing what browsers implement each section.

What about Microsoft and Internet Explorer?

Microsoft has already started implementing parts of HTML5 in IE8.

HTML 5 is being developed with compatibility with existing browsers in mind, though (including IE). Support for many features can be simulated using JavaScript.

Syntax issues

Will (X)HTML 5 finally put an end to the XHTML as text/html debate?

Yes. Unlike HTML 4.01 and XHTML 1.0, the choice of HTML or XHTML is solely dependent upon the choice of MIME type, rather than the DOCTYPE. See HTML vs. XHTML

What will the DOCTYPE be?

In HTML: <!DOCTYPE html>. The only reason there is a DOCTYPE is to trigger standards mode in browsers. This DOCTYPE does that for all current, most legacy, and future browsers.

In XHTML: no DOCTYPE is required, but if present must be the same as in HTML. Note that case is significant in XHTML.

How are pre-HTML5 documents parsed?

All documents with a text/html media type (that is, including those without or with an HTML 2.0, HTML 3.2, HTML 4.01, or XHTML 1.0 DOCTYPE) will be parsed using the same parser algorithm as defined by HTML5. This matches what Web browsers have done for HTML documents so far and keeps code complexity down. That in turn is good for security, maintainability, and in general keeping the amount of bugs down. The HTML syntax of HTML5 therefore does not require a new parser and documents with an HTML 4.01 DOCTYPE for example will be parsed using the HTML5 parser.

Validators are allowed to have different code paths for previous levels of HTML.

If there is no DTD, how can I validate my page?

With an HTML 5 validator.

What is an HTML Serialisation?

The HTML serialisation refers to the syntax of an HTML document defined in HTML5. The syntax is inspired by the SGML syntax from earlier versions of HTML, bits of XML (e.g. allowing a trailing slash on void elements, xmlns attributes), and reality of deployed content on the Web.

Any document whose MIME type is determined to be text/html is considered to be an HTML serialization and must be parsed using an HTML parser.

What is an XML (or XHTML) Serialisation?

The XML Serialization refers to the syntax defined by XML 1.0 and Namespaces in XML 1.0. A resource that has an XML MIME type, such as application/xhtml+xml or application/xml, is an XML document and if it uses elements in the HTML namespace, it contains XHTML. If the root element is “html” in the HTML namespace, the document is referred to as an XHTML document.

What MIME type does HTML5 use?

The HTML serialisation must be served using the text/html MIME type.

The XHTML serialisation must be served using an XML MIME type, such as application/xhtml+xml or application/xml. Unlike XHTML 1.0, XHTML 5 must not be served as text/html.

Using the incorrect MIME type (text/html) for XHTML will cause the document to be parsed according to parsing requirements for HTML. In other words, it will be treated as tag soup. Ensuring the use of an XML MIME type is the only way to ensure that browsers handle the document as XML.

Should I close empty elements with /> or >?

Void elements in HTML (e.g. the br, img and input elements) do not require a trailing slash. e.g. Instead of writing <br />, you only need to write <br>. This is the same as in HTML 4.01. However, due to the widespread attempts to use XHTML 1.0, there are a significant number of pages using the trailing slash. Because of this, the trailing slash syntax has been permitted in order to ease migration from XHTML 1.0 to HTML5.

It is important to realise that this syntax serves no purpose in HTML, it is just ignored by browsers. Despite the fact that it is based upon the XML syntax, it does not mean that HTML documents can be parsed with XML tools. HTML and XHTML are separate serialisations and they each must be processed using tools designed to handle each format.

If I’m careful with the syntax I use in my HTML document, can I process it with an XML parser?

No, HTML and XML have many significant differences, particularly parsing requirements, and you cannot process one using tools designed for the other. However, since HTML5 is defined in terms of the DOM, in most cases there are both HTML and XHTML serialisations available that can represent the same document. There are, however, a few differences explained later that make it impossible to represent some HTML documents accurately as XHTML and vice versa.

If you wish to process an HTML document as XHTML, it requires that you and convert it into XHTML first; and vice versa for processing XHTML as HTML.

What is the namespace declaration?

In XHTML, you are required to specify the namespace. (need to find a simple explanation for what namespaces are for)

<html xmlns="http://www.w3.org/1999/xhtml">

In HTML, the xmlns attribute is currently allowed on any HTML element, but only if it has the value “http://www.w3.org/1999/xhtml“. It doesn’t do anything at all, it is merely allowed to ease migration from XHTML 1.0. It is not actually a namespace declaration in HTML, because HTML doesn’t yet support namespaces. See the question will there be support for namespaces in HTML.

Will there be support for namespaces in HTML?

HTML5 is being defined in terms of the DOM and during parsing of a text/html all HTML elements will be automatically put in the HTML namespace, http://www.w3.org/1999/xhtml. However, unlike the XHTML serialization, there is no real namespace syntax available in the HTML serialization (see previous question). In other words, you do not need to declare the namespace in your HTML markup, as you do in XHTML. However, you are permitted to put an xmlns attribute on each HTML element as long as the namespace is http://www.w3.org/1999/xhtml.

In addition, the HTML syntax provides for a way to embed elements from MathML. Elements placed inside the container element math will automatically be put in the MathML namespace by the parser. Namespace syntax is not required, but again an xmlns attribute is allowed if its value is the MathML namespace.

In conclusion, while HTML5 does not allow the XML namespace syntax, there is a way to embed MathML and the xmlns attribute can be used on any element under the given constraints, in a way that is reasonably compatible on the DOM level.

How do I specify the character encoding?

For HTML, it is strongly recommended that you specify the encoding using the HTTP Content-Type header. If you are unable to configure your server to send the correct headers, then you may use the meta element. The meta element used for this purpose must occur as the first element in the head (even before the title), and within the first 512 bytes of the file.

<meta charset="UTF-8">

Note that this meta element is different from HTML 4, though it is compatible with many browsers because of the way encoding detection has been implemented.

In XHTML, XML rules for determining the character encoding apply. You may use either the HTTP Content-Type header or the XML declaration to specify the encoding.

<?xml version="1.0" encoding="UTF-8"?>

Otherwise, you must use the default of UTF-8 or UTF-16. It is recommended that you use UTF-8.

What are the differences between HTML and XHTML?

See the list of differences between HTML and XHTML in the wiki.

What are best practices to be compatible with HTML DOM and XHTML DOM?

Though the intent is that HTML and XHTML can both produce identical DOMs, there still are some differences between working with an HTML DOM and an XHTML one.

Case sensitivity :

  • Whenever possible, avoid testing Element.tagName, Node.nodeName, and Node.localName.

Namespaces:

  • Use the namespace-aware version of Document.createElement(ns, elementName)

Why does HTML5 legitimise tag soup?

Actually it doesn’t. This is a misconception that comes from the confusion between conformance requirements for documents, and the requirements for user agents.

Due to the fundamental design principle of supporting existing content, the spec must define how to handle all HTML, regardless of whether documents are conforming or not. Therefore, the spec defines (or will define) precisely how to handle and recover from erroneous markup, much of which would be considered tag soup.

For example, the spec defines algorithms for dealing with syntax errors such as incorrectly nested tags, which will ensure that a well structured DOM tree can be produced.

Defining that is essential for one day achieving interoperability between browsers and reducing the dependence upon reverse engineering each other.

However, the conformance requirements for authors are defined separately from the processing requirements. Just because browsers are required to handle erroneous content, it does not make such markup conforming.

For example, user agents will be required to support the marquee element, but authors must not use the marquee element in conforming documents.

It is important to make the distinction between the rules that apply to user agents and the rules that apply to authors for producing conforming documents. They are completely orthogonal.

Feature proposals

HTML5 should support href on any element!

The spec allows <a> to contain blocks. It doesn't support putting href="" on any element, though.

Supporting href on any element has several problems associated with it that make it difficult to support in HTML5. The main reason this isn't in HTML5 is that browser vendors have reported that implementing it would be extremely complex. Browser vendors get to decide what they implement, and there's no point us telling them to do something they aren't going to do. In addition:

  • It isn’t backwards compatible with existing browsers.
  • It adds no new functionality that can’t already be achieved using the a element and a little script.
  • It doesn’t make sense for all elements, such as interactive elements like input and button, where the use of href would interfere with their normal function.

The only advantage it seems to add is that it reduces typing for authors in some cases, but that is not a strong enough reason to support it in light of the other reasons.

Wrapping <a> elements around blocks solves most use cases. It doesn't handle making rows in tables into links, though; for those just do something like this instead:

 <tr onclick="location = this.getElementsByTagName('a')[0]"> ... </tr>

HTML5 should support a way for anyone to invent new elements!

There are actually quite a number of ways for people to invent their own extensions to HTML:

  • Authors can use the class attribute to extend elements, effectively creating their own elements, while using the most applicable existing "real" HTML element, so that browsers and other tools that don't know of the extension can still support it somewhat well. This is the tack used by Microformats, for example.
  • Authors can include data for scripts to process using the data-*="" attributes. These are guaranteed to never be touched by browsers, and allow scripts to include data on HTML elements that scripts can then look for and process.
  • Authors can use the <meta name="" content=""> mechanism to include page-wide metadata. Names should be registered on the wiki's MetaExtensions page.
  • Authors can use the rel="" mechanism to annotate links with specific meanings. This is also used by Microformats. Names should be registered on the wiki's RelExtensions page.
  • Authors can embed raw data using the <script type=""> mechanism with a custom type, for further handling by a script.
  • Authors can create plugins and invoke them using the <embed> element. This is how Flash works.
  • Authors can extend APIs using the JS prototyping mechanism. This is widely used by script libraries, for instance.
  • Authors can propose new elements and attributes to the working group and, if the wider community agrees that they are worth the effort, they are added to the language. (If an addition is urgent, please let us know when proposing it, and we will try to address it quickly.)

There is currently no mechanism for including non-visible proprietary metadata intended for use by user agents in HTML documents (i.e. for introducing new elements and attributes) without discussing the extension with user agent vendors and the wider Web community. This is intentional; we don't want user agents inventing their own proprietary elements and attributes like in the "bad old days" without working with interested parties to make sure their feature is well designed.

We request that people not invent new elements and attributes to add to HTML without first contacting the working group and getting a proposal discussed with interested parties.

HTML5 should group <dt>s and <dd>s together in <di>s!

This is a styling problem and should be fixed in CSS. There's no reason to add a grouping element to HTML, as the semantics are already unambiguous.

Why are some presentational elements like <b>, <i> and <small> still included?

The inclusion of these elements is a largely pragmatic decision based upon their widespread usage, and their usefulness for use cases which are not covered my more specific elements.

While there are a number of common use cases for italics which are covered by more specific elements, such as emphasis (em), citations (cite), definitions (dfn) and variables (var), there are many other use cases which are not covered well by these elements. For example, a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, or a ship name.

Similarly, although a number of common use cases for bold text are also covered by more specific elements such as strong emphasis (strong), headings (h1-h6) or table headers (th); there are others which are not, such as key words in a document abstract or product names in a review.

Some people argue that in such cases, the span element should be used with an appropriate class name and associated stylesheet. However, the b and i elements provide for a reasonable fallback styling in environments that don't support stylesheets or which do not render visually, such as screen readers, and they also provide some indication that the text is somehow distinct from its surrounding content.

In essence, they convey distinct, though non-specific, semantics, which are to be determined by the reader in the context of their use. In other words, although they don’t convey specific semantics by themselves, they indicate that that the content is somehow distinct from its surroundings and leaves the interpretation of the semantics up to the reader.

This is further explained in the article The <b> and <i> Elements

Similarly, the small element is defined for content that is commonly typographically rendered in small print, and which often referred to as fine print. This could include copyright statements, disclaimers and other legal text commonly found at the end of a document.

But they are PRESENTATIONAL!

While <b>, <i> and <small> historically have been presentational, they are defined in a media-independent manner in HTML5. For example, <small> corresponds to the really quickly spoken part at the end of radio advertisements. The problem with elements like <font> isn't that they are presentational per se, it's that they are media-dependent (they apply to visual browsers but not to speech browsers).

WHATWG and the W3C HTML WG

Are there plans to merge the groups?

Not especially. There are people who for a number of reasons are unable to join the W3C group, and there are others who are unable to join the WHATWG group. The editor is in both groups and takes all input into account -- and there are far more places where input on HTML5 is sent than just these two mailing lists (e.g. blogs, [email protected], forums, direct mail, meetings, etc).

Which group has authority in the event of a dispute?

The editor takes feedback from everyone into account and does not look at the source of those arguments for technical arguments.

What is the history of HTML?

Here are some documents that detail the history of HTML:

Mailing List

Should I top-post or reply inline?

Please reply inline or make the reply self-contained.

Basically, please remove anything after the last line you have written, so that people don't have to scroll down to find out what else you wrote, and make sure that your e-mail makes sense on its own, as it will probably be read out of context years later.

That is, you should reply like this:

Ian wrote:
> What do you want? 

I want cats!

> When do you want it?

Now!

You should definitely not reply like this (because this requires people to read your e-mail backwards):

No

Ian wrote:
> Is this a good example of how to post e-mails?

You should also not reply like this (because this leaves people to wonder if there is any text lower down that you have written):

This is a bad way to write e-mail.

Ian wrote:
> Is this a good way to write e-mail?
> Lorem ipsum foo bar baz.
> Unrelated other bits that aren't replied to.
> Yet more text

If you use Outlook or Outlook Express, you can use either Outlook-QuoteFix or OE-QuoteFix. These plugins fix several of Outlook's problems with sending properly formatted emails.