A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Common Subset

From WHATWG Wiki
Revision as of 13:42, 8 December 2006 by Michel Fortin (talk | contribs)
Jump to navigation Jump to search

The common subset intersecting HTML5 and XHTML5 is a subset of both syntaxes meant to create . The common subset is only implicitly defined by the HTML and XHTML specification because they have many syntax elements in common. A document is said to use the common subset when it can parse correctly with both the XML parser and the HTML parser.

A document using the conforming common subset is conforming with the specification whether it is interpreted as HTML or XHTML. The conforming common subset rejects any element with are not conforming in either of the two DOM variants.

A document using the common subset can be served as HTML (text/html media type) or XHTML (with an XML media type). The media type is what the browser use to decide if it'll be parsed as HTML or XHTML and which varient of the DOM is used.

Common Syntax


Limitations from HTML


Limitations from XHTML


Markup Issues and Workarounds

Base URI


<base src="uri">


<html xml:base="uri">

Workaround: HTTP Content-Location header:

Content-Location: uri

HTML 4 Spec

Character Set


<meta http-equiv="Content-Type" value="text/html;charset=utf-8">


<?xml version="1.0" encoding="utf-8"?>

Workaround: HTTP Content-Type header with encoding specified:

Content-Type: text/html;charset=utf-8
Content-Type: application/xhtml+xml;charset=utf-8



<html lang="en">


<html xml:lang="en">

Workaround: HTTP Content-Language header:

Content-Language: en

HTML 5 Spec HTML 4 Spec

Note that there is no conforming workaround to switch language for different parts of a document. There is a method which will work however: if you use HTML's lang attribute, instead of the conformant xml:lang, browser will correctly deduce the language of the element. But this will make the document non-conforming when served with an XML media type and interpreted as XHTML.