A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Talk:HTML vs. XHTML

From WHATWG Wiki
Jump to navigation Jump to search

An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3. And that the proper way to tell the two apart is via MIME types.

There are only two problems with that. XHTML is not as different from HTML as RDF/XML is from N3. And MIME types can't be relied on. Let's take each in turn.

Syntax

  • Both N3 and RDF/XML are used to express sets of RDF triples. They are equally capable: every triple store can be dumped into either format. The analogy here is the DOM. It is not currently the case that every DOM tree can be dumped equally capably into either format.
  • N3 and RDF/XML are not the same, nor do they even look similar. They are different from top to bottom. Not only are no N3 documents valid RDF/XML, there are no individual triples that can be expressed the same way in both formats.
    • You need to explain how RDF/N3 is relevant! --Lachlan Hunt 04:43, 4 December 2006 (UTC)
      • The top of this page starts with "An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3". Need I provide references? -Rubys 14:08, 4 December 2006 (UTC)

Mime Types

  • People have consistently proven that they can't be trusted to configure and set MIME types correctly. Most aren't even aware that MIME types exist. The default setup with Apache is to not allow overrides. One popular use case is for documentation that is served via file:/// URIs directly from your hard disk.
    • file:/// URIs use an OS or browser specific mechanism to determine the MIME. On Windows, for instance (for IE), the file extension is mapped to a MIME type via a key in the registry. --Lachlan Hunt 11:16, 4 December 2006 (UTC)
      • and as such, can rarely be depended upon. In addition to file extensions, content sniffing is also a common strategy. -Rubys 14:12, 4 December 2006 (UTC)
  • HTTP as specified indicates that the the Content-Type header is authoritative - it trumps the XML prolog. HTTP as practiced treats the MIME type as a hint. Whether it be feeds or WMV files, users have an expectation as to what happens when they click on these links, and are unhappy when the browser lets them down.
    • For compatibility, those issues with several file formats do, unfortunately, have to be retained. However, breaking Content-Type in that way for text/html to somehow allow the content to be treated as XML instead is not an option. --Lachlan Hunt 11:16, 4 December 2006 (UTC)

It isn't clear to me (Hixie), however, how the fact that authors can't set the MIME type properly is supposed to be something we can ever solve from the point of view of the syntax of HTML. The full XML syntax isn't compatible with HTML parsers, and the full HTML syntax isn't compatible with XML parsers. The common subset is a tiny language that doesn't support widely used features like <style> or scripting. We can't parse text/html files as anything but HTML. The parser used for content sent with XML MIME types is out of scope for the WHATWG specs (it would be up to the XML guys). It isn't that we WANT the MIME type to be the only way to distinguish the two. It's that the MIME type IS the only way. It's a statement of fact, not desire. Hixie 18:24, 4 December 2006 (UTC)

  • my planet is served as application/xhtml+xml to Firefox and text/html to IE. It seems to be capable of doing both scripting and style in both modes. -Rubys

Ideals

In an ideal word:

  • the syntax of XML and HTML would be either complete identical or completely different.
    • The syntax of HTML and XHTML are completely different. The fact that they look similar on the surface is irrelevant. (see above). --Lachlan Hunt 04:43, 4 December 2006 (UTC)
      • Completely? I'd say that they are as different as en-us and en-au.  :-) -Rubys
  • the set of DOM trees that could be serialized as XHTML and HTML would either be completely identical or completely different.
    • This is not possible without breaking backwards compatibility. These incompatibilities have existed between HTML and XHTML for a long time, and that hasn't stopped people serialising their XHTML as HTML up until now (for all practical purposes, serving XHTML as text/html is equivalent to reserialising). --Lachlan Hunt 11:16, 4 December 2006 (UTC)
  • Content-Type would either always be respected, or always be ignored.
    • Content-Type is always respected for for HTML and XHTML MIME types. It's not for some others, but that's a different issue --Lachlan Hunt 11:16, 4 December 2006 (UTC)
      • Always? Try serving your feed as text/html to FireFox 2.0. -Rubys
      • Try serving your feed as text/html to *any browser* with feed support. Sayrer
  • there would either be a fool-proof way to "sniff" whether the a given content was HTML or XHTML; or there would be no difference between XHTML and HTML in terms of syntax and range of DOM trees that could validly be serialized would also be identical.
    • There is a foolproof way... the MIME type. :-) -Hixie

Analysis

Obviously, the current situation is less than ideal. XML and HTML evolved from a common ancestor. XML isn't changing. And the constraint to be as backwards compatible with HTML4 as humanly possible places practical limits on what can be done. Neither being absolutely identical with the XML syntax nor being completely different are options.

At the present time, the HTML5 syntax is a (near) superset of the XHTML syntax. Yet the situation is (nearly) reversed for the set of DOM trees that can be serialized into XHTML is larger than the set of DOM trees that can be serialized into HTML5.

Having the syntaxes being substantially similar leads to confusion in some edge cases (e.g.,

) but also has some advantages. Similar syntaxes would make things easier for people who have become disillusioned with XHTML and wish to migrate to HTML5. Conversely, similar syntaxes would make incremental migration from HTML5 to XHTML5 easier for those who wish to take advantage of the greater set of DOM trees that can be represented in that syntax.

Potential Strategies

Note: these strategies are not necessarily mutually-exclusive.
  • Develop better tools and actively work to integrate them into products like WordPress and DreamWeaver. (We're doing this already. -Hixie)
  • The definition of HTML5 understandably and correctly puts a higher weight on HTML4 compatibility than XHTML migration. But as a migration aid, identify some unlikely/invalid combination (example: use of the HTML5 DOCTYPE combined with xmlns attribute on the html element combined with the use of a non-xml MIME type) and adjust some (as of yet undefined) set of the HTML5 parsing rules.
  • Document these differences, either in the spec itself (as a non-normative appendix?) and/or by having a conformance checker flag these differences. Variations:
    • Ensure that each of these differences triggers a parse error or equivalent in HTML5; this does not (necessarily) involve changing the recovery action or the way the document is ultimately parsed.
    • Instead of bothering people who may not care about these differences, identify some unlikely combination (such as the DOCTYPE/xmlns/MIME combination above) and have it trigger a pedantic mode which enables these additional checks.

table inside p?

Do you really mean the following?

In XHTML, p elements may contain structured inline level elements including blockquote, dl, menu, ol, ul, pre and table

In what respect are blockquote, dl, menu, ol, ul, and table “inline,” and how are they allowed inside p? – Joeclark 05:07, 5 December 2006 (UTC)

Yes, in XHTML5, as opposed to HTML5, the content model for p elements has been modified to allow structured inline-level elements. However, it's not allowed in HTML5 because of backwards compatibility constraints. The problem is that the end tag for the p element will be implied by the presence of those elements, so it's technically impossible to do, except through DOM manipulation.

The term structured inline-level elements just refers to elements that a usually thought of as being block level, but may be used in inline-level contexts.

Because XHTML isn't constrained by the same compatibility constraints as HTML, this now allows structures like the following:

<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.

  <table>
    <tr>
      <th>Cras est neque</th>
      <th>Posuere id, lacinia eu</th>
    </tr>
    <tr>
      <td>Morbi eu neque.</td>
      <td>Vivamus malesuada arcu </td>
    </tr>
    <tr>
      <td>luctus et ultrices</td>
      <td>posuere cubilia</td>
    </tr>
  </table>     

  Nam id odio vitae enim tempor tincidunt. Sed orci. Nulla facilisi.</p>

All of those elements listed are defined to be allowed where strcutred inline-level content is allowed. This is a change from HTML 4.01 and XHTML 1.0, and is similar to model proposed in XHTML 2.0. -- Lachlan Hunt 13:39, 5 December 2006 (UTC)

table cannot have tr child?

Do you really mean the following:

In XHTML, table elements may contain child tr elements. In the HTML serialisation, due to backwards compatibility constraints, this is not possible (though it may be done through DOM manipulation).

So, in HTML, is this not possible?


<table>
    <tr>
        <td></td>
        <td></td>
    </tr>
</table>

- Note that tagelement. The HTML 4.01 DTD specifies that the TABLE element can only contain CAPTION, COL, COLGROUP, THEAD TFOOT and TBODY. However, both start and end tags of TBODY are optional, so your TRs are actually not children of TABLE, but children of an "invisible" TBODY element:
<table>
    <tbody><!-- this line is optional in HTML -->
        <tr>
            <td></td>
            <td></td>
        </tr>
    </tbody><!-- this line is optional in HTML -->
</table>
You can verify this in Firebug or DOM Inspector. --ThomasR 04:34, 23 March 2009 (UTC)

Harmony

Hi,

Was just wondering whether the page could be structured such that differences were categorized in such a way to allow HTML authors to know how to craft documents in such a way as to be compatible with XHTML (e.g., properly nesting), and XHTML authors to know how to craft documents to be maximally compatible with HTML (e.g., don't use CDATA sections)? I would think this could be done non-redundantly, such as by grouping those belonging to the same category together, or at least color-coding such entries. Maybe a special column could be added for such compatibility guidelines (but again, distinguishing the directionality of the compatibility)?

Also, are the items following the table in the "Syntax and Parsing" supposed to be integrated into the table or what?

Brettz9 02:56, 18 March 2010 (UTC)