HTML vs. XHTML

Differences Between HTML and XHTML

Please note that the information in here is based upon the current spec for (X)HTML5. Some of the issues technically do not apply to previous versions of HTML.

Although HTML and XHTML appear to have similarities in their syntax, they are significantly different in many ways.

Note: As the current WHATWG document is a draft, this section will need to track to a moving target.

MIME Types

Feature	HTML Requirement	XHTML Requirement	Notes
Mime Type	Must use `text/html`.	Must use an XML MIME type, such as `application/xml` or `application/xhtml+xml`.	It is the MIME type that determines what type of document you are using. Any document, including a document authored with the intention of being XHTML, served as `text/html` is technically an HTML document.

Note that XHTML 1.0 previously defined that documents adhering to the compatibility guidelines were allowed allowed to be served as text/html, but HTML 5 now defines that such documents are HTML, not XHTML.

Parsing

XHTML uses XML parsing requirements. HTML uses its own which are defined much more closely to the way browsers actually handle HTML today.

In XHTML, well-formedness errors are fatal. In HTML, error handling rules are much more graceful. XML well-formedness errors, some of which are also syntax errors in HTML, include the following:
- Unencoded ampersands (& instead of &), and less than signs (< instead of <) (This does not apply to CDATA sections). (Note: in HTML, an unencoded ampersand is allowed in some cases.)
- Comments containing extra pairs of hyphens or ending with a hyphen. e.g.
 -  or
 - .
- Mismatched end tags (does not apply to elements with optional tags)
- Unclosed tags.
- Unexpected characters occuring in or before attribute names.
- Unexpected occurrence of EOF.
- Unexpected characters before the DOCTYPE name.
- Missing DOCTYPE name.
- A PUBLIC identifer in a DOCTYPE without a SYSTEM identifier (Note: including either of these is a syntax error in HTML5; but, in XML only the SYSTEM identifier is allowed to occur on its own).
- End tags with attributes.
- Unexpected end tags (in HTML, an unexpected  or  can cause the start tag to be implied before it).
The internal subset is permitted in XML, but meaningless (and forbidden) in HTML.
- In some cases, an internal subset in HTML would end up being partly rendered inline.
The sequence of characters "]]>" in content when it does not mark the end of a CDATA section is a well-formedness error in XHTML, but valid in HTML.
In XHTML: <![CDATA[...]]> is a CDATA section. In HTML, it's a bogus comment.
In XHTML, <?foo ...?> is a processing instruction. In HTML, it's a bogus comment.
In HTML, the trailing slash used for the empty element syntax is a parse error for non-void elements (see below), but is ignored in all cases.
In HTML, the script and style elements are parsed as CDATA elements. (Note: the definition of CDATA differs from that in XML). In XML, they're parsed as normal elements (which means that things that look like comments are treated as real comments, and things that look like start tags actually are start tags).
In HTML, the title and textarea elements are parsed as RCDATA elements. (Note: The definition of RCDATA differs from that in SGML and there is no RCDATA in XML).
In HTML, if scripting is enabled, the noscript element is parsed as an CDATA element. If scripting is disabled, it's parsed as a normal element. In XHTML, the element is always parsed as a normal element, and can't really be used to stop content from being present when script is disabled.
In HTML, the iframe, noembed and noframes elements are parsed as CDATA elements. In XHTML, they are parsed as normal elements, and therefore do not stop content from being used.
White space characters in attribute values are normalized to spaces in XHTML.
In HTML, elements with optional tags are implied in certain conditions.
In HTML, tags for certain elements, which appear out of context, are ignored. This includes caption, col, colgroup, frame, frameset, head, option, optgroup, tbody, td, tfoot, th, thead, tr.
The plaintext element has a special parsing requirement in HTML. (It is, however, forbidden.)
In HTML, a line feed that immediately follows a pre, listing or textarea start tag is ignored.
Many other special handling of edge cases and error conditions, not all of which are listed here, occur in HTML.

Syntax

In HTML, the doctype is required. In XHTML, it is optional.
In HTML, the DOCTYPE is case insensitive. (e.g. <!DOCTYPE HTML> or <!doctype html>, or any case variation of that is acceptable). In XHTML, the DOCTYPE, if used, is case sensitive and must be well-formed XML. i.e. <!DOCTYPE html>, with optional PUBLIC and/or SYSTEM identifiers.
In XHTML, tag names and attribute names are case sensitive. In HTML, they are case insensitive.
In XHTML, non-empty elements require both a start and an end tag. In HTML, certain elements allow the omission of either or both:
- html (both)
- head (both)
- body (both)
- li (end tag)
- dt (end tag)
- dd (end tag)
- p (end tag)
- colgroup (both)
- thead (end tag)
- tbody (both)
- tfoot (end tag)
- tr (end tag)
- td (end tag)
- th (end tag)
In XHTML, empty elements may use either the empty element syntax ( ) or have an end tag immediately follow the start tag ( ). In HTML, the empty element syntax (trailing slash) is allowed on void elements, but forbidden on other elements. However, it serves no purpose whatsoever and can be omitted. End tags for void elements are forbidden.
- base, link, meta, hr, br, img, embed, param, area, col and input
HTML allows attribute minimisation (i.e. omitting the equals sign and the value), XHTML does not.
HTML allows the use of unquoted attribute values, XHTML does not.
XHTML allows the use of CDATA sections, HTML does not.
XHTML allows the use of processing instructions, HTML does not.
In HTML, all entity references are predefined and do not require a DTD. But because there is no DTD for XHTML5, entity references cannot be used in XHTML. (excluding the 5 predefined entities: &, <, >, " and ')
- You can provide your own DTD for use with your own validating parser, but be aware that browsers do not use validating parsers and will not read the DTD.
The valid set of unicode characters in XML 1.0 is limited beyond that in HTML.
Namespace prefixes are permitted in XHTML. They are forbidden in HTML.

Markup

The namespace declaration (xmlns attribute) is required in XHTML. The xmlns attribute is also allowed to appear on any element in HTML on the condition that is has the value "http://www.w3.org/1999/xhtml".
- <html xmlns="http://www.w3.org/1999/xhtml">
- In HTML, the xmlns attribute has absolutely no effect. It is basically a talisman. It is allowed merely to make migration to and from XHTML mildly easier. When parsed by an HTML parser, the attribute ends up in the null namespace
- In XML (with an XML Namespaces-aware parser), an xmlns attribute is part of the namespace declaration mechanism, and an element cannot actually have an xmlns attribute in the null namespace. In DOM implementations, the attribute ends up in the "http://www.w3.org/2000/xmlns/" namespace.
XHTML allows non XHTML elements and attributes (in different namespaces) to be used, HTML does not.
XHTML uses the xml:lang attribute, HTML uses lang instead,
XML ID introduces xml:id, which could be used in XHTML. In HTML it has no effect.
In HTML, the noscript element may be used. In XHTML, it is forbidden.
XHTML can use xml:base, HTML cannot.
In XHTML, table elements may contain child tr elements. In the HTML serialisation, due to backwards compatibility constraints, this is not possible (though it may be done through DOM manipulation).

Character Encoding

In XHTML, the XML declaration may be used to specify the character encoding. In HTML, the XML declaration is forbidden
In HTML, the meta element with a charset attribute may be used instead. It is forbidden in XHTML and is ignored if included.
The default character encoding for XHTML is, according to XML rules, UTF-8 or UTF-16. If the encoding is unspecified in HTML, it should be determined through implementation specific heuristics or fallback to a default value (Note: this section of the spec is not yet finished).

Scripts

document.write() and document.writeln() cannot be used in XHTML, they can in HTML.
In XHTML, the use of the innerHTML property requires that the string be a well-formed fragment of XML.
DOM APIs are case sensitive in XHTML and some are case insensitive in HTML. (This does not apply to elements which are not in the HTML namespace)
- Element.tagName and Node.nodeName return the value in uppercase.
- Document.createElement() is case insensitive (the canonical form is lowercase).
- Element.setAttributeNode() will change the attribute name to lowercase.
- Element.setAttribute() is case insensitive (the canonical form is lowercase).
- Document.getElementsByTagName() and Element.getElementsByTagName() are case insensitive.
- Document.renameNode(). If the new namespace is the HTML namespace, then the new qualified name will be lowercased before the rename takes place.
In HTML, Document.createElement() will create an element in the HTML namespace. In XML (including XHTML), the namespace is defined by both DOM2 and DOM3 to be null.
- In XHTML, browsers lack interoperability in this area. In Firefox and Safari, the namespace is dependent upon the MIME type. In Opera, it's dependent upon the root element.
XPath expressions targeted at pre-HTML5 browsers need to use the XHTML namespace for XHTML and null for HTML. (HTML5 browsers would use the XHTML namespace even in HTML.)

Stylesheets

Selectors, as used in CSS, match case sensitively in XHTML, but case insensitively in HTML.
CSS requires special handling of the body element in HTML for painting backgrounds on the canvas, which do not apply to XHTML.

Differences Between HTML 4.01 and HTML 5

See HTML 5 differences from HTML 4.

Differences Between DOM Level 2.0, 3.0 and the HTML 5 DOM APIs

This section might belong on a separate page.

TODO (need to talk about the changes to the DOM API that HTML5 is making, compared with DOM2 and DOM3)

HTML vs. XHTML

Contents

Differences Between HTML and XHTML

MIME Types

Parsing

Syntax

Markup

Character Encoding

Scripts

Stylesheets

Differences Between HTML 4.01 and HTML 5

Differences Between DOM Level 2.0, 3.0 and the HTML 5 DOM APIs

Translations

Navigation menu

HTML vs. XHTML

Differences Between HTML and XHTML

MIME Types

Parsing

Syntax

Markup

Character Encoding

Scripts

Stylesheets

Differences Between HTML 4.01 and HTML 5

Differences Between DOM Level 2.0, 3.0 and the HTML 5 DOM APIs

Translations

Navigation menu

Search