A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
Talk:HTML vs. XHTML: Difference between revisions
(→Mime Types: so what can we do?) |
|||
Line 19: | Line 19: | ||
It isn't clear to me (Hixie), however, how the fact that authors can't set the MIME type properly is supposed to be something we can ever solve from the point of view of the syntax of HTML. The full XML syntax isn't compatible with HTML parsers, and the full HTML syntax isn't compatible with XML parsers. The common subset is a tiny language that doesn't support widely used features like <style> or scripting. We can't parse text/html files as anything but HTML. The parser used for content sent with XML MIME types is out of scope for the WHATWG specs (it would be up to the XML guys). It isn't that we WANT the MIME type to be the only way to distinguish the two. It's that the MIME type IS the only way. It's a statement of fact, not desire. [[User:Hixie|Hixie]] 18:24, 4 December 2006 (UTC) | It isn't clear to me (Hixie), however, how the fact that authors can't set the MIME type properly is supposed to be something we can ever solve from the point of view of the syntax of HTML. The full XML syntax isn't compatible with HTML parsers, and the full HTML syntax isn't compatible with XML parsers. The common subset is a tiny language that doesn't support widely used features like <style> or scripting. We can't parse text/html files as anything but HTML. The parser used for content sent with XML MIME types is out of scope for the WHATWG specs (it would be up to the XML guys). It isn't that we WANT the MIME type to be the only way to distinguish the two. It's that the MIME type IS the only way. It's a statement of fact, not desire. [[User:Hixie|Hixie]] 18:24, 4 December 2006 (UTC) | ||
* [http://planet.intertwingly.net/ my planet] is served as <code>application/xhtml+xml</code> to Firefox and <code>text/html</code> to IE. It seems to be capable of doing both scripting and style in both modes. -[[User:Rubys|Rubys]] | |||
=== Ideals === | === Ideals === |
Revision as of 20:02, 4 December 2006
An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3. And that the proper way to tell the two apart is via MIME types.
There are only two problems with that. XHTML is not as different from HTML as RDF/XML is from N3. And MIME types can't be relied on. Let's take each in turn.
Syntax
- Both N3 and RDF/XML are used to express sets of RDF triples. They are equally capable: every triple store can be dumped into either format. The analogy here is the DOM. It is not currently the case that every DOM tree can be dumped equally capably into either format.
- N3 and RDF/XML are not the same, nor do they even look similar. They are different from top to bottom. Not only are no N3 documents valid RDF/XML, there are no individual triples that can be expressed the same way in both formats.
- You need to explain how RDF/N3 is relevant! --Lachlan Hunt 04:43, 4 December 2006 (UTC)
- The top of this page starts with "An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3". Need I provide references? -Rubys 14:08, 4 December 2006 (UTC)
- You need to explain how RDF/N3 is relevant! --Lachlan Hunt 04:43, 4 December 2006 (UTC)
Mime Types
- People have consistently proven that they can't be trusted to configure and set MIME types correctly. Most aren't even aware that MIME types exist. The default setup with Apache is to not allow overrides. One popular use case is for documentation that is served via
file:///
URIs directly from your hard disk.file:///
URIs use an OS or browser specific mechanism to determine the MIME. On Windows, for instance (for IE), the file extension is mapped to a MIME type via a key in the registry. --Lachlan Hunt 11:16, 4 December 2006 (UTC)- and as such, can rarely be depended upon. In addition to file extensions, content sniffing is also a common strategy. -Rubys 14:12, 4 December 2006 (UTC)
- HTTP as specified indicates that the the
Content-Type
header is authoritative - it trumps the XML prolog. HTTP as practiced treats the MIME type as a hint. Whether it be feeds or WMV files, users have an expectation as to what happens when they click on these links, and are unhappy when the browser lets them down.- For compatibility, those issues with several file formats do, unfortunately, have to be retained. However, breaking Content-Type in that way for
text/html
to somehow allow the content to be treated as XML instead is not an option. --Lachlan Hunt 11:16, 4 December 2006 (UTC)
- For compatibility, those issues with several file formats do, unfortunately, have to be retained. However, breaking Content-Type in that way for
It isn't clear to me (Hixie), however, how the fact that authors can't set the MIME type properly is supposed to be something we can ever solve from the point of view of the syntax of HTML. The full XML syntax isn't compatible with HTML parsers, and the full HTML syntax isn't compatible with XML parsers. The common subset is a tiny language that doesn't support widely used features like <style> or scripting. We can't parse text/html files as anything but HTML. The parser used for content sent with XML MIME types is out of scope for the WHATWG specs (it would be up to the XML guys). It isn't that we WANT the MIME type to be the only way to distinguish the two. It's that the MIME type IS the only way. It's a statement of fact, not desire. Hixie 18:24, 4 December 2006 (UTC)
- my planet is served as
application/xhtml+xml
to Firefox andtext/html
to IE. It seems to be capable of doing both scripting and style in both modes. -Rubys
Ideals
In an ideal word:
- the syntax of XML and HTML would be either complete identical or completely different.
- The syntax of HTML and XHTML are completely different. The fact that they look similar on the surface is irrelevant. (see above). --Lachlan Hunt 04:43, 4 December 2006 (UTC)
- Completely? I'd say that they are as different as en-us and en-au. :-) -Rubys
- The syntax of HTML and XHTML are completely different. The fact that they look similar on the surface is irrelevant. (see above). --Lachlan Hunt 04:43, 4 December 2006 (UTC)
- the set of DOM trees that could be serialized as XHTML and HTML would either be completely identical or completely different.
- This is not possible without breaking backwards compatibility. These incompatibilities have existed between HTML and XHTML for a long time, and that hasn't stopped people serialising their XHTML as HTML up until now (for all practical purposes, serving XHTML as text/html is equivalent to reserialising). --Lachlan Hunt 11:16, 4 December 2006 (UTC)
Content-Type
would either always be respected, or always be ignored.Content-Type
is always respected for for HTML and XHTML MIME types. It's not for some others, but that's a different issue --Lachlan Hunt 11:16, 4 December 2006 (UTC)
- there would either be a fool-proof way to "sniff" whether the a given content was HTML or XHTML; or there would be no difference between XHTML and HTML in terms of syntax and range of DOM trees that could validly be serialized would also be identical.
- There is a foolproof way... the MIME type. :-) -Hixie
Analysis
Obviously, the current situation is less than ideal. XML and HTML evolved from a common ancestor. XML isn't changing. And the constraint to be as backwards compatible with HTML4 as humanly possible places practical limits on what can be done. Neither being absolutely identical with the XML syntax nor being completely different are options.
At the present time, the HTML5 syntax is a (near) superset of the XHTML syntax. Yet the situation is (nearly) reversed for the set of DOM trees that can be serialized into XHTML is larger than the set of DOM trees that can be serialized into HTML5.
Having the syntaxes being substantially similar leads to confusion in some edge cases (e.g.,
) but also has some advantages. Similar syntaxes would make things easier for people who have become disillusioned with XHTML and wish to migrate to HTML5. Conversely, similar syntaxes would make incremental migration from HTML5 to XHTML5 easier for those who wish to take advantage of the greater set of DOM trees that can be represented in that syntax.
Potential Strategies
Note: these strategies are not necessarily mutually-exclusive.
- Develop better tools and actively work to integrate them into products like WordPress and DreamWeaver. (We're doing this already. -Hixie)
- The definition of HTML5 understandably and correctly puts a higher weight on HTML4 compatibility than XHTML migration. But as a migration aid, identify some unlikely/invalid combination (example: use of the HTML5 DOCTYPE combined with
xmlns
attribute on thehtml
element combined with the use of a non-xml MIME type) and adjust some (as of yet undefined) set of the HTML5 parsing rules. - Document these differences, either in the spec itself (as a non-normative appendix?) and/or by having a conformance checker flag these differences. Variations:
- Ensure that each of these differences triggers a parse error or equivalent in HTML5; this does not (necessarily) involve changing the recovery action or the way the document is ultimately parsed.
- Instead of bothering people who may not care about these differences, identify some unlikely combination (such as the DOCTYPE/xmlns/MIME combination above) and have it trigger a pedantic mode which enables these additional checks.