A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on IRC (such as one of these permanent autoconfirmed members).

Talk:HTML vs. XHTML

From WHATWG Wiki
Revision as of 11:16, 4 December 2006 by Lachlan Hunt (talk | contribs) (Moved discussion from the main article to here)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3. And that the proper way to tell the two apart is via MIME types.

There are only two problems with that. XHTML is not as different from HTML as RDF/XML is from N3. And MIME types can't be relied on. Let's take each in turn.

Syntax

  • Both N3 and RDF/XML are used to express sets of RDF triples. They are equally capable: every triple store can be dumped into either format. The analogy here is the DOM. It is not currently the case that every DOM tree can be dumped equally capably into either format.
  • N3 and RDF/XML are not the same, nor do they even look similar. They are different from top to bottom. Not only are no N3 documents valid RDF/XML, there are no individual triples that can be expressed the same way in both formats.
    • You need to explain how RDF/N3 is relevant! --Lachlan Hunt 04:43, 4 December 2006 (UTC)

Mime Types

  • People have consistently proven that they can't be trusted to configure and set MIME types correctly. Most aren't even aware that MIME types exist. The default setup with Apache is to not allow overrides. One popular use case is for documentation that is served via file:/// URIs directly from your hard disk.
    • file:/// URIs use an OS or browser specific mechanism to determine the MIME. On Windows, for instance (for IE), the file extension is mapped to a MIME type via a key in the registry. --Lachlan Hunt 11:16, 4 December 2006 (UTC)
  • HTTP as specified indicates that the the Content-Type header is authoritative - it trumps the XML prolog. HTTP as practiced treats the MIME type as a hint. Whether it be feeds or WMV files, users have an expectation as to what happens when they click on these links, and are unhappy when the browser lets them down.
    • For compatibility, those issues with several file formats do, unfortunately, have to be retained. However, breaking Content-Type in that way for text/html to somehow allow the content to be treated as XML instead is not an option. --Lachlan Hunt 11:16, 4 December 2006 (UTC)

Ideals

In an ideal word:

  • the syntax of XML and HTML would be either complete identical or completely different.
    • The syntax of HTML and XHTML are completely different. The fact that they look similar on the surface is irrelevant. (see above). --Lachlan Hunt 04:43, 4 December 2006 (UTC)
  • the set of DOM trees that could be serialized as XHTML and HTML would either be completely identical or completely different.
    • This is not possible without breaking backwards compatibility. These incompatibilities have existed between HTML and XHTML for a long time, and that hasn't stopped people serialising their XHTML as HTML up until now (for all practical purposes, serving XHTML as text/html is equivalent to reserialising). --Lachlan Hunt 11:16, 4 December 2006 (UTC)
  • Content-Type would either always be respected, or always be ignored.
    • Content-Type is always respected for for HTML and XHTML MIME types. It's not for some others, but that's a different issue --Lachlan Hunt 11:16, 4 December 2006 (UTC)
  • there would either be a fool-proof way to "sniff" whether the a given content was HTML or XHTML; or there would be no difference between XHTML and HTML in terms of syntax and range of DOM trees that could validly be serialized would also be identical.
    • There is a foolproof way... the MIME type. :-) -Hixie

Analysis

Obviously, the current situation is less than ideal. XML and HTML evolved from a common ancestor. XML isn't changing. And the constraint to be as backwards compatible with HTML4 as humanly possible places practical limits on what can be done. Neither being absolutely identical with the XML syntax nor being completely different are options.

At the present time, the HTML5 syntax is a (near) superset of the XHTML syntax. Yet the situation is (nearly) reversed for the set of DOM trees that can be serialized into XHTML is larger than the set of DOM trees that can be serialized into HTML5.

Having the syntaxes being substantially similar leads to confusion in some edge cases (e.g.,

) but also has some advantages. Similar syntaxes would make things easier for people who have become disillusioned with XHTML and wish to migrate to HTML5. Conversely, similar syntaxes would make incremental migration from HTML5 to XHTML5 easier for those who wish to take advantage of the greater set of DOM trees that can be represented in that syntax.

Potential Strategies

Note: these strategies are not necessarily mutually-exclusive.
  • Develop better tools and actively work to integrate them into products like WordPress and DreamWeaver. (We're doing this already. -Hixie)
  • The definition of HTML5 understandably and correctly puts a higher weight on HTML4 compatibility than XHTML migration. But as a migration aid, identify some unlikely/invalid combination (example: use of the HTML5 DOCTYPE combined with xmlns attribute on the html element combined with the use of a non-xml MIME type) and adjust some (as of yet undefined) set of the HTML5 parsing rules.
  • Document these differences, either in the spec itself (as a non-normative appendix?) and/or by having a conformance checker flag these differences. Variations:
    • Ensure that each of these differences triggers a parse error or equivalent in HTML5; this does not (necessarily) involve changing the recovery action or the way the document is ultimately parsed.
    • Instead of bothering people who may not care about these differences, identify some unlikely combination (such as the DOCTYPE/xmlns/MIME combination above) and have it trigger a pedantic mode which enables these additional checks.