A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Common Subset: Difference between revisions

From WHATWG Wiki
Jump to navigation Jump to search
(Redirected page to HTML vs. XHTML)
 
(3 intermediate revisions by one other user not shown)
Line 1: Line 1:
The common subset intersecting HTML5 and XHTML5 is a subset of both syntaxes meant to create . The common subset is only implicitly defined by the HTML and XHTML specification because they have many syntax elements in common. A document is said to use the common subset when it can parse correctly with both the XML parser and the HTML parser.
#REDIRECT [[HTML vs. XHTML]]
 
A document using the '''conforming''' common subset is conforming with the specification whether it is interpreted as HTML or XHTML. The conforming common subset rejects any element with are not conforming in either of the two DOM variants.
 
A document using the common subset can be served as HTML (text/html media type) or XHTML (with an XML media type). The media type is what the browser use to decide if it'll be parsed as HTML or XHTML and which varient of the DOM is used.
 
 
== Common Syntax ==
 
[TBD]
 
== Limitations from HTML ==
 
* The [http://blog.whatwg.org/faq/#doctype doctype is required] to be <code><!DOCTYPE html></code>
* Tag and attribute names must be lowercase.
* XMLish end tag tailing slash must not be used with non-void elements; they'll be ignored by the HTML parser and are invalid in HTML.
* XMLish CDATA blocks will not work.
* HTML does not allow mixing with other XML dialects.
* <code>script</code> elements cannot contain the text <code></script</code> followed by <code>/</code>, <code>></code> or whitespace unless in an escaped text span (<code>&lt;!--...--></code>).
 
== Limitations from XML and XHTML ==
 
* The doctype, optional in XHTML but mandatory in HTML, must match case-sensitivly this <code><!DOCTYPE html></code> to be well-formed and valid in XHTML.
* Well-formness contrains, not respecting these will generate fatal errors in XHTML.
** Comments cannot contain double-hyphens (<code>--</code>).
** Start tags and end tags must be balenced correctly, unless they're void element.
** Void tags must always be closed by a tailing slash (<code>/></code>).
** All attributes values must be quoted. Attributes without value are disallowed.
** All <code><</code> and <code>&</code> in the text must be escaped, so is <code>></code> anywhere in the text when preceded by <code>]]</code> (where it would be CDATA section end marker).
** Some characters are illegal in XML (U+0009, U+000A, U+000D, U+0020-U+D7FF, U+E000-U+FFFD, U+10000-U+10FFFF) [http://www.w3.org/TR/REC-xml/#charsets XML charsets]
** Others constrains defined in the [http://www.w3.org/TR/REC-xml/ XML specification].
* <code>script</code> and <code>style</code> elements may not contain <code><</code> or <code>&</code> in their unescaped form. External scripts and stylesheets are unaffected.
** There is a trick to allow <code><</code> and <code>&</code> which involves using CDATA blocks and inside JavaScript comments. See the workarounds section.
* <code>noscript</code> has no effect in XHTML.
* <code>document.write()</code> does not work in XHTML.
* Entity references cannot be used in XHTML (excluding the 5 predefined entities: <code>&amp;amp;</code>, <code>&amp;lt;</code>, <code>&amp;gt;</code>, <code>&amp;quot;</code> and <code>&amp;apos;</code>).
* The namespace declaration (<code>xmlns</code> attribute) is required in XHTML. The xmlns attribute is also allowed to appear in HTML on the condition that is has the value "http://www.w3.org/1999/xhtml". <code>&lt;html xmlns="http://www.w3.org/1999/xhtml"></code>
* DOM Core APIs are case-sensitive in XHTML, scripts should always use lowercase to be compatible.
* Style rules are matching case sensitivly in XHTML, stylesheets should always use lowercase tag and attribute names to be compatible.
 
== Markup Issues and Workarounds ==
 
=== Base URI ===
 
HTML:
 
<base src="uri">
 
XML / XHTML:
 
<html xml:base="uri">
 
Workaround: HTTP Content-Location header:
 
Content-Location: uri
 
[http://www.w3.org/TR/html4/struct/links.html#h-12.4.1 HTML 4 Spec]
 
 
=== Character Set ===
 
HTML
 
<meta http-equiv="Content-Type" value="text/html;charset=utf-8">
 
XHTML / XML
 
<?xml version="1.0" encoding="utf-8"?>
 
Workaround: HTTP Content-Type header with encoding specified:
 
Content-Type: text/html;charset=utf-8
Content-Type: application/xhtml+xml;charset=utf-8
 
 
=== Language ===
 
HTML
 
<html lang="en">
 
XML / XHTML
 
<html xml:lang="en">
 
Workaround: HTTP Content-Language header:
 
Content-Language: en
 
[http://www.whatwg.org/specs/web-apps/current-work/#lang HTML 5 Spec]
[http://www.w3.org/TR/html4/struct/dirlang.html#h-8.1.2 HTML 4 Spec]
 
Note that there is no conforming workaround to switch language for different parts of a document. There is a method which will work however: if you use HTML's lang attribute, instead of the conformant xml:lang, browser will correctly deduce the language of the element. But this will make the document non-conforming when served with an XML media type and interpreted as XHTML.
 
 
=== Scripts & Style ===
 
HTML
 
<script type="text/javascript">
if (a < 0 && a > 10) alert("A not in range (0 < a < 10).")
</script>
 
XML / XHTML
 
<script type="text/javascript">
if (a &amp;lt; 0 &amp;amp;&amp;amp; a > 10) alert("A not in range (0 &amp;lt; a &amp;lt; 10).")
</script>
 
or
 
<script type="text/javascript">
<![CDATA[
if (a < 0 && a > 10) alert("A not in range (0 < a < 10).")
]]>
</script>
 
Workaround: Commented CDATA block around the problemantic part of the script, or the whole script:
 
<script type="text/javascript">
/* <![CDATA[ */
if (a < 0 && a > 10) alert("A not in range (0 < a < 10).")
/* ]]> */
</script>
 
This works because HTML puts the CDATA block markers textually inside the script, but as they're then inside comments it has no effect, and the CDATA block allows the XML parser to work with unescaped character data. The same trick can be applied to the <code>style</code> element when it contains <code><</code> or <code>&</code>.
 
Note that the element must not contain the string <code>]]></code>, or the XML wouldn't be well-formed, and it may not contain <code></</code>, or it would be non-conformant HTML. In all cases, this is only needed where <code>script</code> or <code>style</code> contains <code>&</code> or <code><</code>.

Latest revision as of 14:26, 9 July 2013

Redirect to: