A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Common Subset: Difference between revisions

From WHATWG Wiki
Jump to navigation Jump to search
Line 12: Line 12:
== Limitations from HTML ==
== Limitations from HTML ==


[TBD]
* The [http://blog.whatwg.org/faq/#doctype doctype is required] to be <code><!DOCTYPE html></code>
* Tag and attribute names must be lowercase.
* XMLish end tag tailing slash must not be used with non-void elements; they'll be ignored by the HTML parser and are invalid in HTML.
* XMLish CDATA blocks will not work.
* HTML does not allow mixing with other XML dialects.
* HTML does not support <code>p</code> elements may contain structured inline level elements including <code>blockquote</code>, <code>dl</code>, <code>menu</code>, <code>ol</code>, <code>ul</code>, <code>pre</code> and <code>table</code>.


== Limitations from XML and XHTML ==
== Limitations from XML and XHTML ==

Revision as of 20:38, 8 December 2006

The common subset intersecting HTML5 and XHTML5 is a subset of both syntaxes meant to create . The common subset is only implicitly defined by the HTML and XHTML specification because they have many syntax elements in common. A document is said to use the common subset when it can parse correctly with both the XML parser and the HTML parser.

A document using the conforming common subset is conforming with the specification whether it is interpreted as HTML or XHTML. The conforming common subset rejects any element with are not conforming in either of the two DOM variants.

A document using the common subset can be served as HTML (text/html media type) or XHTML (with an XML media type). The media type is what the browser use to decide if it'll be parsed as HTML or XHTML and which varient of the DOM is used.


Common Syntax

[TBD]

Limitations from HTML

  • The doctype is required to be <!DOCTYPE html>
  • Tag and attribute names must be lowercase.
  • XMLish end tag tailing slash must not be used with non-void elements; they'll be ignored by the HTML parser and are invalid in HTML.
  • XMLish CDATA blocks will not work.
  • HTML does not allow mixing with other XML dialects.
  • HTML does not support p elements may contain structured inline level elements including blockquote, dl, menu, ol, ul, pre and table.

Limitations from XML and XHTML

  • The doctype, optional in XHTML but mandatory in HTML, must match case-sensitivly this <!DOCTYPE html> to be well-formed and valid in XHTML.
  • Well-formness contrains, not respecting these will generate fatal errors in XHTML.
    • Comments cannot contain double-hyphens (--).
    • Start tags and end tags must be balenced correctly, unless they're void element.
    • Void tags must always be closed by a tailing slash ("/>").
    • All attributes values must be quoted. Attributes without value are disallowed.
    • All < and & in the text must be escaped, so is > inside comments or anywhere in the text when preceded by ]] (where it would be CData section end marker).
    • Some characters are illegal in XML (U+0009, U+000A, U+000D, U+0020-U+D7FF, U+E000-U+FFFD, U+10000-U+10FFFF)
    • Others constrains defined in the XML specification.
  • script and style elements may not contain < or & in their unescaped form, unless they're in a CDATA block which would make the document non-conforming on the HTML side (althoug it should still be possible to make it work). External scripts and stylesheets are unaffected.
  • noscript has no effect in XHTML.
  • document.write() does not work in XHTML.
  • Entity references cannot be used in XHTML (excluding the 5 predefined entities: &, <, >, " and ').
  • The namespace declaration (xmlns attribute) is required in XHTML. The xmlns attribute is also allowed to appear on the html element in HTML on the condition that is has the value "http://www.w3.org/1999/xhtml". <html xmlns="http://www.w3.org/1999/xhtml">
  • DOM apis are case-sensitive in XHTML, scripts should always use lowercase to be compatible.
  • Style rules are matching case sensitivly in XHTML, stylesheets should always use lowercase tag, attribute and class names to be compatible.

Markup Issues and Workarounds

Base URI

HTML:

<base src="uri">

XML / XHTML:

<html xml:base="uri">

Workaround: HTTP Content-Location header:

Content-Location: uri

HTML 4 Spec


Character Set

HTML

<meta http-equiv="Content-Type" value="text/html;charset=utf-8">

XHTML / XML

<?xml version="1.0" encoding="utf-8"?>

Workaround: HTTP Content-Type header with encoding specified:

Content-Type: text/html;charset=utf-8
Content-Type: application/xhtml+xml;charset=utf-8


Language

HTML

<html lang="en">

XML / XHTML

<html xml:lang="en">

Workaround: HTTP Content-Language header:

Content-Language: en

HTML 5 Spec HTML 4 Spec

Note that there is no conforming workaround to switch language for different parts of a document. There is a method which will work however: if you use HTML's lang attribute, instead of the conformant xml:lang, browser will correctly deduce the language of the element. But this will make the document non-conforming when served with an XML media type and interpreted as XHTML.