The common subset intersecting HTML5 and XHTML5 is a subset of both syntaxes meant to create . The common subset is only implicitly defined by the HTML and XHTML specification because they have many syntax elements in common. A document is said to use the common subset when it can parse correctly with both the XML parser and the HTML parser.
A document using the conforming common subset is conforming with the specification whether it is interpreted as HTML or XHTML. The conforming common subset rejects any element with are not conforming in either of the two DOM variants.
A document using the common subset can be served as HTML (text/html media type) or XHTML (with an XML media type). The media type is what the browser use to decide if it'll be parsed as HTML or XHTML and which varient of the DOM is used.
Limitations from HTML
- The doctype is required to be
- Tag and attribute names must be lowercase.
- XMLish end tag tailing slash must not be used with non-void elements; they'll be ignored by the HTML parser and are invalid in HTML.
- XMLish CDATA blocks will not work.
- HTML does not allow mixing with other XML dialects.
- HTML does not support
pelements may contain structured inline level elements including
Limitations from XML and XHTML
- The doctype, optional in XHTML but mandatory in HTML, must match case-sensitivly this
<!DOCTYPE html>to be well-formed and valid in XHTML.
- Well-formness contrains, not respecting these will generate fatal errors in XHTML.
- Comments cannot contain double-hyphens (--).
- Start tags and end tags must be balenced correctly, unless they're void element.
- Void tags must always be closed by a tailing slash ("/>").
- All attributes values must be quoted. Attributes without value are disallowed.
- All < and & in the text must be escaped, so is > inside comments or anywhere in the text when preceded by ]] (where it would be CData section end marker).
- Some characters are illegal in XML (U+0009, U+000A, U+000D, U+0020-U+D7FF, U+E000-U+FFFD, U+10000-U+10FFFF)
- Others constrains defined in the XML specification.
styleelements may not contain < or & in their unescaped form, unless they're in a CDATA block which would make the document non-conforming on the HTML side (althoug it should still be possible to make it work). External scripts and stylesheets are unaffected.
noscripthas no effect in XHTML.
document.write()does not work in XHTML.
- Entity references cannot be used in XHTML (excluding the 5 predefined entities: &, <, >, " and ').
- The namespace declaration (
xmlnsattribute) is required in XHTML. The xmlns attribute is also allowed to appear on the html element in HTML on the condition that is has the value "http://www.w3.org/1999/xhtml". <html xmlns="http://www.w3.org/1999/xhtml">
- DOM apis are case-sensitive in XHTML, scripts should always use lowercase to be compatible.
- Style rules are matching case sensitivly in XHTML, stylesheets should always use lowercase tag, attribute and class names to be compatible.
Markup Issues and Workarounds
XML / XHTML:
Workaround: HTTP Content-Location header:
<meta http-equiv="Content-Type" value="text/html;charset=utf-8">
XHTML / XML
<?xml version="1.0" encoding="utf-8"?>
Workaround: HTTP Content-Type header with encoding specified:
Content-Type: text/html;charset=utf-8 Content-Type: application/xhtml+xml;charset=utf-8
XML / XHTML
Workaround: HTTP Content-Language header:
Note that there is no conforming workaround to switch language for different parts of a document. There is a method which will work however: if you use HTML's lang attribute, instead of the conformant xml:lang, browser will correctly deduce the language of the element. But this will make the document non-conforming when served with an XML media type and interpreted as XHTML.