Talk:HTML vs. XHTML: Difference between revisions

Revision as of 20:02, 4 December 2006

An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3. And that the proper way to tell the two apart is via MIME types.

There are only two problems with that. XHTML is not as different from HTML as RDF/XML is from N3. And MIME types can't be relied on. Let's take each in turn.

Syntax

Both N3 and RDF/XML are used to express sets of RDF triples. They are equally capable: every triple store can be dumped into either format. The analogy here is the DOM. It is not currently the case that every DOM tree can be dumped equally capably into either format.
N3 and RDF/XML are not the same, nor do they even look similar. They are different from top to bottom. Not only are no N3 documents valid RDF/XML, there are no individual triples that can be expressed the same way in both formats.
- You need to explain how RDF/N3 is relevant! --Lachlan Hunt 04:43, 4 December 2006 (UTC)
  - The top of this page starts with "An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3". Need I provide references? -Rubys 14:08, 4 December 2006 (UTC)

Mime Types

People have consistently proven that they can't be trusted to configure and set MIME types correctly. Most aren't even aware that MIME types exist. The default setup with Apache is to not allow overrides. One popular use case is for documentation that is served via file:/// URIs directly from your hard disk.
- file:/// URIs use an OS or browser specific mechanism to determine the MIME. On Windows, for instance (for IE), the file extension is mapped to a MIME type via a key in the registry. --Lachlan Hunt 11:16, 4 December 2006 (UTC)
  - and as such, can rarely be depended upon. In addition to file extensions, content sniffing is also a common strategy. -Rubys 14:12, 4 December 2006 (UTC)
HTTP as specified indicates that the the Content-Type header is authoritative - it trumps the XML prolog. HTTP as practiced treats the MIME type as a hint. Whether it be feeds or WMV files, users have an expectation as to what happens when they click on these links, and are unhappy when the browser lets them down.
- For compatibility, those issues with several file formats do, unfortunately, have to be retained. However, breaking Content-Type in that way for text/html to somehow allow the content to be treated as XML instead is not an option. --Lachlan Hunt 11:16, 4 December 2006 (UTC)

It isn't clear to me (Hixie), however, how the fact that authors can't set the MIME type properly is supposed to be something we can ever solve from the point of view of the syntax of HTML. The full XML syntax isn't compatible with HTML parsers, and the full HTML syntax isn't compatible with XML parsers. The common subset is a tiny language that doesn't support widely used features like <style> or scripting. We can't parse text/html files as anything but HTML. The parser used for content sent with XML MIME types is out of scope for the WHATWG specs (it would be up to the XML guys). It isn't that we WANT the MIME type to be the only way to distinguish the two. It's that the MIME type IS the only way. It's a statement of fact, not desire. Hixie 18:24, 4 December 2006 (UTC)

my planet is served as application/xhtml+xml to Firefox and text/html to IE. It seems to be capable of doing both scripting and style in both modes. -Rubys

Ideals

In an ideal word:

the syntax of XML and HTML would be either complete identical or completely different.
- The syntax of HTML and XHTML are completely different. The fact that they look similar on the surface is irrelevant. (see above). --Lachlan Hunt 04:43, 4 December 2006 (UTC)
  - Completely? I'd say that they are as different as en-us and en-au. :-) -Rubys
the set of DOM trees that could be serialized as XHTML and HTML would either be completely identical or completely different.
- This is not possible without breaking backwards compatibility. These incompatibilities have existed between HTML and XHTML for a long time, and that hasn't stopped people serialising their XHTML as HTML up until now (for all practical purposes, serving XHTML as text/html is equivalent to reserialising). --Lachlan Hunt 11:16, 4 December 2006 (UTC)
Content-Type would either always be respected, or always be ignored.
- Content-Type is always respected for for HTML and XHTML MIME types. It's not for some others, but that's a different issue --Lachlan Hunt 11:16, 4 December 2006 (UTC)
  - Always? Try serving your feed as text/html to FireFox 2.0. -Rubys
  - Try serving your feed as text/html to *any browser* with feed support. Sayrer
there would either be a fool-proof way to "sniff" whether the a given content was HTML or XHTML; or there would be no difference between XHTML and HTML in terms of syntax and range of DOM trees that could validly be serialized would also be identical.
- There is a foolproof way... the MIME type. :-) -Hixie

Analysis

Obviously, the current situation is less than ideal. XML and HTML evolved from a common ancestor. XML isn't changing. And the constraint to be as backwards compatible with HTML4 as humanly possible places practical limits on what can be done. Neither being absolutely identical with the XML syntax nor being completely different are options.

At the present time, the HTML5 syntax is a (near) superset of the XHTML syntax. Yet the situation is (nearly) reversed for the set of DOM trees that can be serialized into XHTML is larger than the set of DOM trees that can be serialized into HTML5.

Having the syntaxes being substantially similar leads to confusion in some edge cases (e.g.,

) but also has some advantages. Similar syntaxes would make things easier for people who have become disillusioned with XHTML and wish to migrate to HTML5. Conversely, similar syntaxes would make incremental migration from HTML5 to XHTML5 easier for those who wish to take advantage of the greater set of DOM trees that can be represented in that syntax.

Potential Strategies

Note: these strategies are not necessarily mutually-exclusive.

Develop better tools and actively work to integrate them into products like WordPress and DreamWeaver. (We're doing this already. -Hixie)
The definition of HTML5 understandably and correctly puts a higher weight on HTML4 compatibility than XHTML migration. But as a migration aid, identify some unlikely/invalid combination (example: use of the HTML5 DOCTYPE combined with xmlns attribute on the html element combined with the use of a non-xml MIME type) and adjust some (as of yet undefined) set of the HTML5 parsing rules.
Document these differences, either in the spec itself (as a non-normative appendix?) and/or by having a conformance checker flag these differences. Variations:
- Ensure that each of these differences triggers a parse error or equivalent in HTML5; this does not (necessarily) involve changing the recovery action or the way the document is ultimately parsed.
- Instead of bothering people who may not care about these differences, identify some unlikely combination (such as the DOCTYPE/xmlns/MIME combination above) and have it trigger a pedantic mode which enables these additional checks.

@@ Line 19: / Line 19: @@
 It isn't clear to me (Hixie), however, how the fact that authors can't set the MIME type properly is supposed to be something we can ever solve from the point of view of the syntax of HTML. The full XML syntax isn't compatible with HTML parsers, and the full HTML syntax isn't compatible with XML parsers. The common subset is a tiny language that doesn't support widely used features like <style> or scripting. We can't parse text/html files as anything but HTML. The parser used for content sent with XML MIME types is out of scope for the WHATWG specs (it would be up to the XML guys). It isn't that we WANT the MIME type to be the only way to distinguish the two. It's that the MIME type IS the only way. It's a statement of fact, not desire. [[User:Hixie|Hixie]] 18:24, 4 December 2006 (UTC)
+* [http://planet.intertwingly.net/ my planet] is served as <code>application/xhtml+xml</code> to Firefox and <code>text/html</code> to IE.  It seems to be capable of doing both scripting and style in both modes.  -[[User:Rubys|Rubys]]
 === Ideals ===

Talk:HTML vs. XHTML: Difference between revisions

Revision as of 20:02, 4 December 2006

Contents

Syntax

Mime Types

Ideals

Analysis

Potential Strategies

Navigation menu

Talk:HTML vs. XHTML: Difference between revisions

Revision as of 20:02, 4 December 2006

Syntax

Mime Types

Ideals

Analysis

Potential Strategies

Navigation menu

Search