https://wiki.whatwg.org/api.php?action=feedcontributions&user=Rubys&feedformat=atomWHATWG Wiki - User contributions [en]2024-03-29T10:52:49ZUser contributionsMediaWiki 1.39.3https://wiki.whatwg.org/index.php?title=PragmaExtensions&diff=4513PragmaExtensions2010-04-14T00:27:27Z<p>Rubys: PICS-Label</p>
<hr />
<div>This page lists the allowed extension values for the http-equiv="" attribute of the &lt;meta> element in HTML5. Such extensions are limited to previously-registered HTTP headers defined in an RFC, that have specific user-agent processing requirements, and that do not affect the HTTP processing model. For more details, [http://www.whatwg.org/specs/web-apps/current-work/#concept-http-equiv-extensions see the specification].<br />
<br />
{| <br />
! Keyword<br />
! Brief description<br />
! Specification<br />
|- <br />
| PICS-Label<br />
| Labels that help content advisory software protect children from potentially harmful material [[http://www.fosi.org/icra/ more]]<br />
| [http://www.w3.org/TR/REC-PICS-labels/ PICS Label Distribution Label Syntax and Communication Protocols]<br />
|}</div>Rubyshttps://wiki.whatwg.org/index.php?title=HTML_vs._XHTML&diff=4027HTML vs. XHTML2009-09-08T19:17:06Z<p>Rubys: /* Character Encoding */</p>
<hr />
<div>== Differences Between HTML and XHTML ==<br />
<br />
<p style="border: 1px dashed lightgray; background-color: #FFEEEE; padding: .5em 1em;"><strong>This page is currently being revised. Some information is incomplete or missing.</strong></p><br />
<br />
<p style="border: 1px dashed lightgray; background-color: #FFF8E4; padding: .5em 1em;">Please note that the information in here is based upon the current spec for (X)HTML5. Some of the issues technically do not apply to previous versions of HTML.</p><br />
<br />
Although HTML and XHTML appear to have similarities in their syntax, they are significantly different in many ways.<br />
<br />
:'''Note''': As the current WHATWG document is a draft, this section will need to track to a moving target.<br />
<br />
=== MIME Types ===<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Feature<br />
! HTML Requirement<br />
! XHTML Requirement<br />
! Notes<br />
|-<br />
| Mime Type<br />
| Must use <code>text/html</code>.<br />
| Must use an XML MIME type, such as <code>application/xml</code> or <code>application/xhtml+xml</code>.<br />
| It is the MIME type that determines what type of document you are using. Any document, including a document authored with the intention of being XHTML, served as <code>text/html</code> is technically an HTML document.<br />
|}<br />
<br />
Note that XHTML 1.0 previously defined that documents adhering to the compatibility guidelines were allowed allowed to be served as <code>text/html</code>, but HTML 5 now defines that such documents are HTML, not XHTML.<br />
<br />
=== Syntax and Parsing ===<br />
<br />
XHTML uses XML parsing requirements. HTML uses its own which are defined much more closely to the way browsers actually handle HTML today. The following table describes the differences between how each is parsed.<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Feature<br />
! HTML Requirement<br />
! XHTML Requirement<br />
! Notes<br />
|-<br />
!Parsing Modes<br />
|Three parsing modes are defined: ''no quirks mode'', ''quirks mode'' and ''limited quirks mode''. The mode is only ever changed from the default by the HTML parser, based on the presence, absence, or value of the DOCTYPE string. <br />
|XML parsing rules are used. There is only one mode.<br />
|The parsing modes in HTML also have an effect upon script and stylehsheet processing. XHTML is considered to be in ''no quirks mode'' for these purposes.<br />
|-<br />
!Error Handling<br />
|HTML does not have a well-formedness constraint, no errors are fatal. Graceful error handling and recovery procedures are thoroughly defined.<br />
|Well-formedness errors are fatal<br />
| <br />
|-<br />
!Namespaces<br />
|Elements and attributes for known vocabularies (HTML, SVG and MathML) are implicitly assigned to appropriate namespaces, according to the rules specified in the parsing algorithm.<br />
|The rules defined in the [http://www.w3.org/TR/REC-xml-names/ Namespaces in XML] specification apply. Namespaces must be explicitly declared.<br />
|<br />
|-<br />
!Namespace attributes on HTML elements<br />
|Elements in the HTML namespace may have an <code>xmlns</code> attribute specified, if, and only if, it has the exact value <code>"http://www.w3.org/1999/xhtml"</code>. The attribute has absolutely no effect. It is basically a talisman. It is allowed merely to make migration to and from XHTML mildly easier. When parsed by an HTML parser, the attribute ends up in no namespace.<br />
<br />
Attributes of the form <code>xmlns:<var>prefix</var></code> may not be used on HTML elements.<br />
|The HTML namespace must be declared for HTML elements according to the rules defined by ''Namespaces in XML''. The <code>xmlns</code> and <code>xmlns:<var>prefix</var></code> attributes end up in the <code>"http://www.w3.org/2000/xmlns"</code> namespace.<br />
|<br />
|-<br />
!Namespace attributes on foreign elements<br />
|<br />
Elements in the SVG namespace may have an <code>xmlns</code> attribute specified, if, and only if, it has the exact value <code>"http://www.w3.org/2000/svg"</code>. The attribute is optional because the namespace is implied during parsing.<br />
<br />
Elements in the MathML namespace may have an <code>xmlns</code> attribute specified, if, and only if, it has the exact value <code>"http://www.w3.org/1998/Math/MathML"</code>. The attribute is optional because the namespace is implied during parsing.<br />
<br />
Foreign elements may also have an <code>xmlns:xlink</code> attribute specified, if, and only if, it has the exact value <code>"http://www.w3.org/1999/xlink"</code>. This attribute is optional, even if XLink attributes are used, because the namespaces for XLink attributes is implied during parsing.<br />
<br />
When parsed by an HTML parser, the <code>xmlns</code> and <code>xmlns:xlink</code> attributes end up in the <code>"http://www.w3.org/2000/xmlns"</code> namespace.<br />
|The SVG and MathML namespaces must be declared for SVG and MathML elements, respectively, according to the rules defined by ''Namespaces in XML''. The <code>xmlns</code> and <code>xmlns:<var>prefix</var></code> attributes end up in the <code>"http://www.w3.org/2000/xmlns"</code> namespace.<br />
|<br />
|-<br />
!XLink attributes<br />
|Foreign elements may use the attributes <code>xlink:actuate</code>, <code>xlink:arcrole</code>, <code>xlink:href</code>, <code>xlink:role</code>, <code>xlink:show</code>, <code>xlink:title</code> and <code>xlink:type</code>. These attributes are placed in the <code>"http://www.w3.org/1999/xlink"</code>. The prefix used must be "<code>xlink</code>".<br />
|XLink attributes may be specified on foreign elements using any prefix, subject to the conformance rules defined by ''Namespaces in XML''. The XLink namespace must be declared according to the conformance rules defined by ''Namespaces in XML'' if XLink attributes are used within the document.<br />
|<br />
|-<br />
!XML attributes<br />
|<br />
Foreign elements may use the attributes <code>xml:lang</code>, <code>xml:base</code> and <code>xml:space</code>. These attributes are placed in the <code>"http://www.w3.org/XML/1998/namespace"</code>. The prefix used must be "<code>xml</code>".<br />
<br />
HTML elements may use the <code>xml:lang</code> attribute. The attribute in no namespace with no prefix and with the literal localname "<code>xml:lang</code>" has no effect on language processing. HTML elements must not use the <code>xml:base</code> or <code>xml:space</code> attributes.<br />
| Any element, including HTML elements, may use the attributes <code>xml:lang</code>, <code>xml:base</code> and <code>xml:space</code>. These attributes are placed in the <code>"http://www.w3.org/XML/1998/namespace"</code>. The prefix used must be "<code>xml</code>".<br />
|<br />
|-<br />
!Space characters<br />
|The space characters are defined as:<br />
* U+0009 CHARACTER TABULATION<br />
* U+000A LINE FEED<br />
* U+000C FORM FEED<br />
* U+000D CARRIAGE RETURN<br />
* U+0020 SPACE<br />
|The space characters are defined as:<br />
* U+0009 CHARACTER TABULATION<br />
* U+000A LINE FEED<br />
* U+000D CARRIAGE RETURN<br />
* U+0020 SPACE<br />
|The difference is the inclusion of Form Feed.<br />
|-<br />
! The DOCTYPE<br />
|<br />
A DOCTYPE is a mostly useless, but required, header. The DOCTYPE is used during parsing to determing the parsing mode. The keywords "<code>DOCTYPE</code>", "<code>PUBLIC</code>" and "<code>SYSTEM</code>", and the name "<code>html</code>" are treated case insensitively. The system identifier <code>"about:legacy-compat"</code> (and the public and system identifiers for previous versions of HTML) are case sensitive.<br />
<br />
Conforming HTML documents are required to use <code>&lt;!DOCTYPE html&gt;</code> (case insensitively) or the legacy-compat version <code>&lt;!DOCTYPE html SYSTEM "about:legacy-compat"&gt;</code>.<br />
<br />
When using the obsolete but conforming DOCTYPEs based on the HTML 4.0 and 4.01 Strict DTDs, the system identifier is optional. The obsolete but conforming DOCTYPEs based on XHTML 1.0 Strict and XHTML 1.1 may also be specified.<br />
<br />
Use of an internal subset is forbidden. The system identifier is never de-referenced by HTML implementations.<br />
|<br />
The DOCTYPE is optional. XML rules for case sensitivity apply (everything is case sensitive).<br />
<br />
Either of the DOCTYPEs defined in HTML5 may be used, or any other custom DOCTYPE. If the poublic identifier is specified, the system identifier must also be specified. The obsolete status of the ''obsolete permitted DOCTYPEs'' defined for HTML does not apply to XHTML. Any DOCTYPE may be used, subject to the conformance rules defined by XML.<br />
<br />
Use of an internal subset is permitted according to the requirements of XML. Some validating XML processors may dereference the system identifier, if used, but most browsers use non-validating processors.<br />
|<br />
|-<br />
! Void Elements<br />
| Void elements only have a start tag; end tags must not be specified for void elements, and it is impossible for them to contain any content. A trailing slash may optionally be inserted at the end of the element's tag, immediately before the closing greater-than sign.<br />
| Void elements may use either the empty-element tag syntax (''EmptyElemTag'') or use a start tag immediately followed by an end tag, with no content in between. While it is possible for the element to contain content, this is non-conforming.<br />
|<br />
|-<br />
! Raw text elements<br />
|<br />
|<br />
|<br />
|-<br />
! RCDATA elements<br />
|<br />
|<br />
|<br />
|-<br />
! Foreign elements<br />
|<br />
|<br />
|<br />
|-<br />
! Normal elements<br />
|<br />
|<br />
|<br />
|-<br />
! Optional tags<br />
|<br />
For [[#HTML_Elements_with_Optional_Tags|some elements]], the start and/or end tags are optional and are implied by certain specified conditions. For example, the end tag for the <code>p</code> element is implied by a subsequent <code>p</code> element.<br />
<br />
Omitting the end tag for other elements is a parse error and various error recovery procedures are applied appropriately.<br />
| End tags must be explicitly included for all elements, except empty elements using the ''EmptyElemTag'' syntax.<br />
| <br />
|-<br />
! Unescaped Special Characters <br />
|<br />
Unescaped ampersands (U+0026 AMPERSAND - <code>&amp;</code>, instead of <code>&amp;amp;</code>) are permitted within the content of ''normal elements'', ''RCDATA elements'', ''foreign elements'' and ''attribute values'' where they are not considered to be ''ambiguous ampersands'', and within ''Raw text elements''.<br />
<br />
Unescaped less than signs (U+003C LESS-THAN SIGN - <code>&lt;</code>, instead of <code>&amp;lt;</code>) are permitted in ''Raw text elements'', ''RCDATA elements'' and ''attribute values'', excluding the ''unquoted attribute value syntax''.<br />
| Unescaped ampersands and less-than signs may not appear within ''CharData'' or ''AttValue'' (basically, the normal text content of elements and attribute values.) Violation of this constraint is a well-formedness error.<br />
| <br />
|-<br />
! Comment syntax<br />
| Comments must start with the four character sequence "<code>&lt;!--</code>" and must be ended by the three character sequence "<code>--></code>". The content of comments must not start with a single U+003E GREATER-THAN SIGN ('>') character, nor start with a U+002D HYPHEN-MINUS (-) character followed by a U+003E GREATER-THAN SIGN ('>') character, nor contain two consecutive U+002D HYPHEN-MINUS (-) characters, nor end with a U+002D HYPHEN-MINUS (-) character. Violating these constraints is a parse error and various error recovery procedures are applied appropriately.<br />
| The content of comments must not contain two consecutive U+002D HYPHEN-MINUS (-) characters, nor end with a hypen. Violating this is a well-formedness error.<br />
| <br />
|-<br />
!CDATA sections<br />
|<br />
|<br />
|<br />
|-<br />
!Processing Instructions<br />
|<br />
|<br />
|<br />
|-<br />
!Character References<br />
|<br />
|<br />
|<br />
|-<br />
!Entity References<br />
|<br />
|<br />
|<br />
|}<br />
<br />
'''THIS PAGE IS IN THE PROCESS OF BEING REVISED'''<br />
<br />
* HTML Parse Errors with special handling:<br />
** End tags with attributes. <br />
** Unexpected end tags (in HTML, an unexpected <code>&lt;/br></code> or <code>&lt;/p></code> can cause the start tag to be implied before it).<br />
<br />
* The sequence of characters &quot;<code>]]&gt;</code>&quot; in content when it does not mark the end of a <code>CDATA</code> section is a well-formedness error in XHTML, but valid in HTML.<br />
* In XHTML: <code>&lt;![CDATA[...]]&gt;</code> is a <code>CDATA</code> section. In HTML, it's a bogus comment.<br />
* In XHTML, <code>&lt;?foo ...?&gt;</code> is a processing instruction. In HTML, it's a bogus comment.<br />
* In HTML, the trailing slash used for the empty element syntax is a parse error for non-void elements (see below), but is ignored in all cases.<br />
* In HTML, the <code>script</code> and <code>style</code> elements are parsed as <code>CDATA</code> elements. (Note: the definition of <code>CDATA</code> differs from that in XML). In XML, they're parsed as normal elements (which means that things that look like comments are treated as <em>real</em> comments, and things that look like start tags actually are start tags).<br />
* In HTML, the <code>title</code> and <code>textarea</code> elements are parsed as <code>RCDATA</code> elements. (Note: The definition of <code>RCDATA</code> differs from that in SGML and there is no <code>RCDATA</code> in XML).<br />
* In HTML, if scripting is enabled, the <code>noscript</code> element is parsed as an <code>CDATA</code> element. If scripting is disabled, it's parsed as a normal element. In XHTML, the element is always parsed as a normal element, and can't really be used to stop content from being present when script is disabled.<br />
* In HTML, the <code>iframe</code>, <code>noembed</code> and <code>noframes</code> elements are parsed as <code>CDATA</code> elements. In XHTML, they are parsed as normal elements, and therefore do not stop content from being used.<br />
* White space characters in attribute values are [http://www.w3.org/TR/REC-xml/#AVNormalize normalized] to spaces in XHTML.<br />
* In HTML, elements with optional tags are implied in certain conditions.<br />
* In HTML, tags for certain elements, which appear out of context, are ignored. This includes <code>caption</code>, <code>col</code>, <code>colgroup</code>, <code>frame</code>, <code>frameset</code>, <code>head</code>, <code>option</code>, <code>optgroup</code>, <code>tbody</code>, <code>td</code>, <code>tfoot</code>, <code>th</code>, <code>thead</code>, <code>tr</code>.<br />
* The <code>plaintext</code> element has a special parsing requirement in HTML. (It is, however, forbidden.)<br />
* In HTML, a line feed that immediately follows a <code>pre</code>, <code>listing</code> or <code>textarea</code> start tag is ignored.<br />
* <em>Many other special handling of edge cases and error conditions, not all of which are listed here, occur in HTML.</em><br />
<br />
<br />
* In HTML, [http://wiki.whatwg.org/wiki/FAQ#What_will_the_DOCTYPE_be.3F the <code>doctype</code> is required]. In XHTML, it is optional.<br />
* In HTML, the DOCTYPE is case insensitive. (e.g. <code>&lt;!DOCTYPE HTML&gt;</code> or <code>&lt;!doctype html&gt;</code>, or any case variation of that is acceptable). In XHTML, the DOCTYPE, if used, is case sensitive and must be well-formed XML. i.e. <code>&lt;!DOCTYPE html&gt;</code>, with optional PUBLIC and/or SYSTEM identifiers.<br />
* In XHTML, tag names and attribute names are case sensitive. In HTML, they are case insensitive.<br />
* In XHTML, non-empty elements require both a start and an end tag. In HTML, certain elements allow the omission of either or both:<br />
<br />
* In XHTML, empty elements may use either the empty element syntax (<code>&lt;br/&gt;</code>) or have an end tag immediately follow the start tag (<code>&lt;br&gt;&lt;/br&gt;</code>). In HTML, the empty element syntax (trailing slash) is allowed on void elements, but forbidden on other elements. However, it serves no purpose whatsoever and can be omitted. End tags for void elements are forbidden.<br />
** <code>base</code>, <code>link</code>, <code>meta</code>, <code>hr</code>, <code>br</code>, <code>img</code>, <code>embed</code>, <code>param</code>, <code>area</code>, <code>col</code> and <code>input</code><br />
* HTML allows attribute minimisation (i.e. omitting the equals sign and the value), XHTML does not.<br />
* HTML allows the use of unquoted attribute values, XHTML does not.<br />
* XHTML allows the use of <code>CDATA</code> sections, HTML does not.<br />
* XHTML allows the use of processing instructions, HTML does not.<br />
* In HTML, all entity references are predefined and do not require a DTD. But because there is no DTD for XHTML5, entity references cannot be used in XHTML. (excluding the 5 predefined entities: <code>&amp;amp;</code>, <code>&amp;lt;</code>, <code>&amp;gt;</code>, <code>&amp;quot;</code> and <code>&amp;apos;)</code><br />
** You can provide your own DTD for use with your own validating parser, but be aware that browsers do not use validating parsers and will not read the DTD.<br />
* The valid set of unicode characters in XML 1.0 is limited beyond that in HTML.<br />
* Namespace prefixes are permitted in XHTML. They are forbidden in HTML.<br />
<br />
==== HTML Elements with Optional Tags ====<br />
<br />
{| class="wikitable" border="1"<br />
|-<br />
! Element<br />
! Start Tag<br />
! End Tag<br />
|-<br />
!html<br />
|optional<br />
|optional<br />
|-<br />
!head<br />
|optional<br />
|optional<br />
|-<br />
!body<br />
|optional<br />
|optional<br />
|-<br />
!li<br />
|required<br />
|optional<br />
|-<br />
!dt<br />
|required<br />
|optional<br />
|-<br />
!dt<br />
|required<br />
|optional<br />
|-<br />
!p<br />
|required<br />
|optional<br />
|-<br />
!colgroup<br />
|optional<br />
|optional<br />
|-<br />
!thead<br />
|required<br />
|optional<br />
|-<br />
!tbody<br />
|optional<br />
|optional<br />
|-<br />
!tfoot<br />
|required<br />
|optional<br />
|-<br />
!tr<br />
|required<br />
|optional<br />
|-<br />
!th<br />
|required<br />
|optional<br />
|-<br />
!td<br />
|required<br />
|optional<br />
|-<br />
!rt<br />
|required<br />
|optional<br />
|-<br />
!rp<br />
|required<br />
|optional<br />
|-<br />
!optgroup<br />
|required<br />
|optional<br />
|-<br />
!option<br />
|required<br />
|optional<br />
|}<br />
<br />
=== Markup ===<br />
<br />
* The [http://wiki.whatwg.org/wiki/FAQ#What_is_the_namespace_declaration.3F namespace declaration] (<code>xmlns</code> attribute) is required in XHTML. The xmlns attribute is also allowed to appear on any element in HTML on the condition that is has the value <code><nowiki>"http://www.w3.org/1999/xhtml"</nowiki></code>.<br />
** <code><nowiki>&lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;</nowiki></code><br />
** In HTML, the xmlns attribute has absolutely no effect. It is basically a talisman. It is allowed merely to make migration to and from XHTML mildly easier. When parsed by an HTML parser, the attribute ends up in the null namespace<br />
** In XML (with an [http://www.w3.org/TR/xml-names/ XML Namespaces]-aware parser), an xmlns attribute is part of the namespace declaration mechanism, and an element cannot actually have an xmlns attribute in the null namespace. In DOM implementations, the attribute ends up in the "<code><nowiki>http://www.w3.org/2000/xmlns/</nowiki></code>" namespace.<br />
* XHTML allows non XHTML elements and attributes (in different namespaces) to be used, HTML does not.<br />
* XHTML uses the <code>xml:lang</code> attribute, HTML uses <code>lang</code> instead,<br />
* XML ID introduces <code>xml:id</code>, which could be used in XHTML. In HTML it has no effect.<br />
* In HTML, the <code>noscript</code> element may be used. In XHTML, it is forbidden.<br />
* XHTML can use <code>xml:base</code>, HTML cannot. <br />
* In XHTML, <code>table</code> elements may contain child <code>tr</code> elements. In the HTML serialisation, due to backwards compatibility constraints, this is not possible (though it may be done through DOM manipulation).<br />
<br />
=== Character Encoding ===<br />
<br />
* In XHTML, the XML declaration may be used to [http://wiki.whatwg.org/wiki/FAQ#How_do_I_specify_the_character_encoding.3F specify the character encoding]. In HTML, the XML declaration is forbidden<br />
* In HTML, the <code>meta</code> element with a <code>charset</code> attribute may be used instead. It is forbidden in XHTML unless it specifies 'UTF-8' (case insensitively) and is ignored if included.<br />
* The default character encoding for XHTML is, according to XML rules, <code>UTF-8</code> or <code>UTF-16</code>. If the encoding is unspecified in HTML, it should be determined through implementation specific heuristics or fallback to a default value (Note: this section of the spec is not yet finished).<br />
<br />
=== Scripts ===<br />
<br />
* <code>document.write()</code> and <code>document.writeln()</code> cannot be used in XHTML, they can in HTML. <br />
* In XHTML, the use of the <code>innerHTML</code> property requires that the string be a well-formed fragment of XML. <br />
* DOM APIs are case sensitive in XHTML and some are case insensitive in HTML. (This does not apply to elements which are not in the HTML namespace)<br />
** Element.tagName and Node.nodeName return the value in uppercase.<br />
** Document.createElement() is case insensitive (the canonical form is lowercase).<br />
** Element.setAttributeNode() will change the attribute name to lowercase.<br />
** Element.setAttribute() is case insensitive (the canonical form is lowercase).<br />
** Document.getElementsByTagName() and Element.getElementsByTagName() are case insensitive.<br />
** Document.renameNode(). If the new namespace is the HTML namespace, then the new qualified name will be lowercased before the rename takes place.<br />
* In HTML, Document.createElement() will create an element in the HTML namespace. In XML (including XHTML), the namespace is defined by both DOM2 and DOM3 to be null.<br />
** In XHTML, browsers lack interoperability in this area. In Firefox and Safari, the namespace is dependent upon the MIME type. In Opera, it's dependent upon the root element.<br />
* XPath expressions targeted at pre-HTML5 browsers need to use the XHTML namespace for XHTML and null for HTML. (HTML5 browsers would use the XHTML namespace even in HTML.)<br />
<br />
=== Stylesheets ===<br />
<br />
* Selectors, as used in CSS, match case sensitively in XHTML, but case insensitively in HTML.<br />
* CSS requires special handling of the body element in HTML for painting backgrounds on the canvas, which do not apply to XHTML.<br />
<br />
== Differences Between HTML 4.01 and HTML 5 ==<br />
<br />
See [http://dev.w3.org/html5/html4-differences/ HTML 5 differences from HTML 4].<br />
<br />
== Differences Between DOM Level 2.0, 3.0 and the HTML 5 DOM APIs ==<br />
<br />
'''This section might belong on a separate page.'''<br />
<br />
* TODO (need to talk about the changes to the DOM API that HTML5 is making, compared with DOM2 and DOM3)<br />
<br />
== Translations ==<br />
<br />
* [http://meiert.com/de/publications/translations/whatwg.org/html-vs-xhtml/ German translation: "HTML 5 und XHTML 5 im Vergleich (WHATWG)"]<br />
* [http://dancewithnet.com/2007/10/28/differences-between-html-and-xhtml/ Chinese translation: "HTML和XHTML的不同"]</div>Rubyshttps://wiki.whatwg.org/index.php?title=RelExtensions&diff=3608RelExtensions2009-04-15T00:07:59Z<p>Rubys: solid lines (and futher demonstrate the use case for additional attributes that are (a) interoperable, (b) useful, and (c) non-conforming HTML5)</p>
<hr />
<div>This page lists the allowed extension values for the rel="" attribute in HTML5. You may add your own values to this list, which makes them legal HTML5 rel values. We ask that you try to avoid redundancy; if someone has already defined a value that does roughly what you want, please reuse it.<br />
<br />
{|border=1 cellpadding=4 cellspacing=0<br />
! rowspan=2 | Keyword<br />
! colspan=2 | Effect on...<br />
! rowspan=2 | Brief description<br />
! rowspan=2 | Link to more details<br />
! rowspan=2 | Synonyms<br />
! rowspan=2 | Status<br />
|- <br />
! link<br />
! a and area<br />
|- <br />
| acquaintance<br />
| hyperlink<br />
| hyperlink<br />
| the person represented by the current document considers the person represented by the referenced document to be an acquaintance<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| author<br />
| hyperlink<br />
| hyperlink<br />
| The linked document is the page/email an agent (people or firm or...) responsible for the content.<br />
| <br />
| <br />
| Proposal<br />
|-<br />
| canonical<br />
| hyperlink<br />
| not allowed<br />
| Robots (e.g., search engines) should treat the document containing the tag as a minor variation of the linked document, which may result in the removal of the former from a web index and in the consolidation of its quality signals in the latter.<br />
| [http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html]<br />
|<br />
| Proposal<br />
|-<br />
| chapter<br />
| hyperlink<br />
| hyperlink<br />
| Target document is a subdocument of the current document.<br />
| [http://www.w3.org/TR/html4/types.html#h-6.12 HTML4]<br />
| section, subsection, appendix<br />
| Proposal<br />
|-<br />
| child<br />
| hyperlink<br />
| hyperlink<br />
| the referenced person is a child of the person represented by the current document<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| co-resident<br />
| hyperlink<br />
| hyperlink<br />
| the referenced person lives in the same residence as the person represented by the current document<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| co-worker<br />
| hyperlink<br />
| hyperlink<br />
| the referenced person is a co-worker of the person represented by the current document<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| colleague<br />
| hyperlink<br />
| hyperlink<br />
| the referenced person is a colleague of the person represented by the current document<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| contact<br />
| hyperlink<br />
| hyperlink<br />
| the person represented by the current document considers the person represented by the referenced document to be a contact<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| contributor<br />
| hyperlink<br />
| hyperlink<br />
| The linked document is the page/email an agent (people or firm or...) involved in the production of the content, but not his main author(s).<br />
| <br />
| <br />
| Proposal<br />
|-<br />
| crush<br />
| hyperlink<br />
| hyperlink<br />
| this person considers the referenced person to be a crush (i.e. has a crush on the referenced person)<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| date<br />
| hyperlink<br />
| hyperlink<br />
| this person considers the referenced person to be a date (i.e. is dating the referenced person)<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| edit<br />
| hyperlink<br />
| hyperlink<br />
| Target document is an editable version of the current document.<br />
| [http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-11.html#new-link-relation Atom Protocol]<br />
| <br />
| Proposal<br />
|-<br />
| edituri<br />
| hyperlink<br />
| not allowed<br />
| a link to an RSD file describing how to edit the given page.<br />
| [http://cyber.law.harvard.edu/blogs/gems/tech/rsd.htm rsd]<br />
|<br />
| Proposal<br />
|-<br />
| friend<br />
| hyperlink<br />
| hyperlink<br />
| the person represented by the current document considers the person represented by the referenced document to be a friend<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| glossary<br />
| hyperlink<br />
| hyperlink<br />
| Target document provides definitions for words in current document.<br />
| [http://www.w3.org/TR/html4/types.html#h-6.12 HTML4]<br />
| <br />
| Proposal<br />
|-<br />
| kin<br />
| hyperlink<br />
| hyperlink<br />
| the referenced person is part of the extended family of the person represented by the current document<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| license<br />
| hyperlink<br />
| hyperlink<br />
| The linked document is a license for the current document<br />
| [http://microformats.org/wiki/rel-license rel-license]<br />
| <br />
| Proposal<br />
|-<br />
| me<br />
| hyperlink<br />
| hyperlink<br />
| the referenced document represents the same person as does the current document<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| met<br />
| hyperlink<br />
| hyperlink<br />
| this person has met the referenced person<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| muse<br />
| hyperlink<br />
| hyperlink<br />
| the referenced person inspires the person represented by the current document<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| neighbor<br />
| hyperlink<br />
| hyperlink<br />
| the referenced person lives nearby the person represented by the current document<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| openid.delegate<br />
| external resource<br />
| not allowed<br />
| OpenID 1.1 authentication delegation<br />
| [http://openid.net/specs/openid-authentication-1_1.html#delegating_authentication OpenID specification]<br />
| <br />
| Proposal<br />
|-<br />
| openid.server<br />
| external resource<br />
| not allowed<br />
| OpenID 1.1 authentication delegation<br />
| [http://openid.net/specs/openid-authentication-1_1.html#delegating_authentication OpenID specification]<br />
| <br />
| Proposal<br />
|-<br />
| openid2.local_id<br />
| external resource<br />
| not allowed<br />
| OpenID 2.0 authentication delegation<br />
| [http://openid.net/specs/openid-authentication-2_0.html#html_disco OpenID Auth 2.0 section 7.3.3]<br />
| <br />
| Proposal<br />
|-<br />
| openid2.provider<br />
| external resource<br />
| not allowed<br />
| OpenID 2.0 authentication endpoint<br />
| [http://openid.net/specs/openid-authentication-2_0.html#html_disco OpenID Auth 2.0 section 7.3.3]<br />
| <br />
| Proposal<br />
|-<br />
| parent<br />
| hyperlink<br />
| hyperlink<br />
| the referenced person is a parent of the person represented by the current document<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| pgpkey<br />
| hyperlink<br />
| not allowed<br />
| The linked document is the PGP public key file (which may contain multiple keys) of the author(s) of the page.<br />
| [http://purl.org/net/pgpkey/], [http://golem.ph.utexas.edu/~distler/blog/archives/000320.html]<br />
|<br />
| Proposal<br />
|-<br />
| profile<br />
| hyperlink<br />
| not allowed<br />
| this referenced link is a metadata profile for the current document<br />
| [http://www.w3.org/TR/html401/struct/global.html#profiles HTML Meta data profiles]<br />
|<br />
| Proposal<br />
|-<br />
| related<br />
| hyperlink<br />
| hyperlink<br />
| this referenced link identifies a resource related to the current document<br />
| [http://tools.ietf.org/html/rfc4287#section-4.2.7 Atom Syndication Format]<br />
|<br />
| Proposal<br />
|-<br />
| reviewer<br />
| hyperlink<br />
| not allowed<br />
| The linked document is the page/email an agent (people or firm or...) responsible for reviewing the content.<br />
| Of interest: used by the CSS WG for the CSS 2.1 Test Suite.<br />
|-<br />
| script<br />
| not allowed<br />
| not allowed<br />
| Was proposed to replace &lt;script>. Use &lt;script> instead.<br />
| none<br />
| <br />
| Rejected<br />
|-<br />
| service<br />
| external resource<br />
| not allowed<br />
| Points to a resource describing a service API<br />
| [[ServiceRelExtension]]<br />
| <br />
| Proposal<br />
|-<br />
| shortlink<br />
| hyperlink<br />
| hyperlink<br />
| Identifies a shorter form of the URL for the current document, provided by the document owner.<br />
| [http://code.google.com/p/shortlink/wiki/Specification shortlink Specification]<br />
|<br />
| Proposal<br />
|-<br />
| sibling<br />
| hyperlink<br />
| hyperlink<br />
| the referenced person is a sibling of the person represented by the current document<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| spouse<br />
| hyperlink<br />
| hyperlink<br />
| the referenced person is a spouse of the person represented by the current document<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| sweetheart<br />
| hyperlink<br />
| hyperlink<br />
| this person considers the referenced person to be their sweetheart<br />
| [http://gmpg.org/xfn/11 XFN]<br />
|<br />
| Proposal<br />
|-<br />
| tag<br />
| hyperlink<br />
| hyperlink<br />
| The linked document is an author-designated "tag" (or keyword/subject) for the current page.<br />
| [http://microformats.org/wiki/rel-tag rel-tag]<br />
| <br />
| Proposal<br />
|-<br />
| technicalauthor<br />
| hyperlink<br />
| hyperlink<br />
| The linked document is the page/email an agent (people or firm or...) responsible for the technical construction of the page (i.e. the HTML/CSS/PHP code), not for the content.<br />
| <br />
| <br />
| Proposal<br />
|-<br />
| timesheet<br />
| external resource<br />
| not allowed<br />
| SMIL Timesheet<br />
| [http://www.w3.org/TR/timesheets/#smilTimesheetsNS-Elements-Timesheet SMIL Timesheets 1.0]<br />
| <br />
| Proposal<br />
|-<br />
| translator<br />
| hyperlink<br />
| hyperlink<br />
| The linked document is the page/email an agent (people or firm or...) responsible for the translation of the page.<br />
| <br />
| <br />
| Proposal<br />
|-<br />
| webmaster<br />
| hyperlink<br />
| hyperlink<br />
| The linked document is the page/email an agent (people or firm or...) aviable for requests about the content of the page.<br />
| <br />
| maintainer<br />
| Proposal<br />
|-<br />
| widget<br />
| hyperlink<br />
| hyperlink<br />
| Points to a widget.<br />
| [http://dev.w3.org/2006/waf/widgets/Overview.html#autodiscovery Widgets 1.0 Editor's draft]<br />
| <br />
| Proposal<br />
|-<br />
| wlwmanifest<br />
| hyperlink<br />
| not allowed<br />
| A link to a manifest for Windows Live Writer.<br />
| [http://msdn.microsoft.com/en-us/library/bb463263.aspx msdn]<br />
|<br />
| Proposal<br />
|}<br />
<br />
The "Effect on... link" column must either say "not allowed" if the rel value is not allowed on &lt;link> elements, "hyperlink" if the rel value creates a hyperlink, or "external resource" if the rel value creates a link to an external resource.<br />
<br />
The "Effect on... a and area" column must either say "not allowed" or "hyperlink".<br />
<br />
For the "Status" section to be changed to "Accepted", the proposed keyword must either have been through the [http://microformats.org/wiki/process Microformats process], and been approved by the Microformats community; or must be defined by a W3C specification in the Candidate Recommendation or Recommendation state. If it fails to go through this process, it is "Rejected".<br />
<br />
For more details, see [http://whatwg.org/specs/web-apps/current-work/#linkTypes the HTML5 specification]. See also [http://microformats.org/wiki/existing-rel-values the Microformats wiki page on this matter].</div>Rubyshttps://wiki.whatwg.org/index.php?title=Extensions&diff=3022Extensions2008-04-02T13:46:00Z<p>Rubys: Proposal 3: XML5</p>
<hr />
<div>= Ways to arbitrarily extend text/html for new vocabularies =<br />
<br />
Please put ideas for what it should look like here, each in their own section.<br />
<br />
Each example should explain in details (ideally with examples) how to handle:<br />
* Syntax errors at the tokeniser level, the tree construction level, and the schema level.<br />
* Existing content that happens to use elements or syntax that you are proposing have special processing rules.<br />
* Pages that contain any special syntax after that syntax was copied and pasted by an ignorant Web author from a valid page written by a competent Web author aware of the new syntax.<br />
<br />
See also SVG-specific proposals in [[Diagrams in HTML]].<br />
<br />
== Proposal 1: xmlns strawman ==<br />
<br />
When you hit an element with an xmlns="" attribute, switch to an XML parser until that parser has parsed the matching end tag.<br />
<br />
<pre><br />
bla bla text/html bla bla <foo xmlns="http://example.com/foo"><this><must/><br />
be<valid>XML! </valid></this> must be.</foo> bla bla text/html<br />
</pre><br />
<br />
Errors cause the entire page to stop parsing.<br />
<br />
Existing pages are not handled.<br />
<br />
Pages that copy-and-paste this syntax then use it incorrectly are not handled.<br />
<br />
=== Reasons why we can't do this ===<br />
* There are pages that already specify xmlns="" attributes that would break if the content were processed as XML. For example, [http://www.live.com/ http://www.live.com/].<br />
<br />
:* Probably, xmlns="" attribute, when used for HTML5 extensibility purposes, should be clearly marked as such, to disambiguate from legacy uses. For example, it could be explicitly declared at the root of the document:<br />
<pre><br />
<html xmlns:xmlns="urn:html5:xmlns:for-example"><br />
...<br />
<foo xmlns="http://example.com/foo"><br />
<!-- the region of the "foo" extension --><br />
</foo><br />
...<br />
</html><br />
</pre><br />
<br />
== Proposal 2: Extensibility Element (<ext>) ==<br />
<br />
This is a possible generic extensibility point, for SVG and possibly MathML or other XML content. Naturally, any content placed in an <ext> element would have to be understood by the UA in order to render correctly, and more complex rules may need to be developed for specific kinds of interaction between the root document and the inline content, such as with script, CSS, etc. For some discussion, see the [http://krijnhoetmer.nl/irc-logs/whatwg/20080401#l-441 IRC logs].<br />
<br />
(The <ext> element would have a name that doesn't clash with existing content. Inside you can use XML or another format.)<br />
<br />
<pre><br />
<p>Hello world. <br />
<ext><br />
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 10 10"><br />
<circle x="5" y="5 r="5" stroke="green"/><br />
</svg><br />
</ext> <br />
</p><br />
</pre><br />
<br />
We should define a content model for where the <ext> element can occur, and if there are implications for different locations (such as inside a table, a paragraph, the head, etc). The simplest thing, at least for SVG (and probably MathML), would be that it would have the same restrictions as an <img> element. Also, there should be a default block model for <ext> in CSS.<br />
<br />
'''What's the exact processing model?'''<br />
<br />
Notes:<br />
<br />
* This is similar to IE's "XML islands" with the <xml> element. It's believed that there are some conflicts with the <xml> element itself, since it creates a separate document that is tied to the <xml> element in the DOM, but more research is needed.<br />
<br />
* The <ext> element could potentially be an implicit element, generated by the HTML5 parser on encountering a start tag of e.g. "<svg " or "<math ". That would save authors of having to type this extra element, but has a drawback in that it doesn't provide fallback content for legacy UA:s. -Ed<br />
<br />
* We could specify exactly what flavors of markup must be supported by a UA, and which may be supported by a UA. This would be rather restrictive, but could improve interoperability of UA features, and would ensure that the proper DOM interfaces are available. For example, SVG and MathML must be supported, and FooML may be supported (or something).<br />
<br />
=== Error Handling ===<br />
<br />
'''Pick one! Or separate the proposal into several proposals, for each different proposal, so that they can be evaluated. The proposals below are just brief notes, not detailed enough for me to know what you mean. -Hixie'''<br />
<br />
The main options seem to be:<br />
# strict XML parsing (not favored by many)<br />
# very permissive error handling (as in HTML5); this idea is controversial and has many open issues, which should be detailed below <br />
# moderate error handling, as detailed in SVG Tiny 1.2 <br />
# other ideas?<br />
<br />
The chief risk with permissive error handling is that it would create content that is not compatible across different UAs, including mobile devices and authoring tools.<br />
<br />
Tentative proposals:<br />
* Tokenizer recovers from errors by ignoring them and moving on; for SVG, any element with errors is not rendered.<br />
* Tree construction recovers from errors by closing the <svg> element, and not rendering any content after the error. <br />
* Case folding is not supported within the main body of the <ext> element, though it would be within the <fallback> element.<br />
* The tree builder would assign the appropriate namespace URI to the element nodes it creates.<br />
* If the "/>" is not found at the end of an element, all subsequent element will be placed as child elements of the element (and thus not rendered) until a matching closing tag is found, or until the a matching root tag is found, or until the "</ext>" element is found.<br />
* Unknown content is ignored<br />
* Unquoted attribute values will be ignored (should the element also not be rendered?) <br />
<br />
See an [http://lists.w3.org/Archives/Public/public-html/2007Oct/0158.html email by Henri Sivonen] for comparison and contrast.<br />
<br />
=== Embedded HTML ===<br />
<br />
The case of content inside a <foreignObject> element could be subject to the parsing model of the root document. (Note that this is only a partial solution, and more thought and details are needed.)<br />
<br />
For content outside <foreignObject>, it should follow the XML processing rules.<br />
<br />
=== Fallback Behavior ===<br />
<br />
This is an opportunity to get nice fallback behavior, as well. <br />
<br />
Here's a possible suggestion, where the raster image would show in UAs that didn't support the <ext> syntax, and the SVG would show in those that did (and which support SVG). In UAs which support <ext> and not SVG, the fallback would also be the raster. The fallback content should be inside a wrapper element (<fallback>), so that you can have rich fallback options, such as an image map, a table, <canvas> and an accompanying <script> element, or whatever; in this case, I also include fallback CSS to hide textual content in title, desc, and text elements, but it may be desirable to leave this content as alternate text to the image, even including styling.<br />
<br />
For MathML content, a conditional CSS override could allow for CSS styling of MathML elements for those that don't render MathML natively.<br />
<br />
''Note: as stated before, the names of the <ext> and <fallback> elements are subject to change based on existing element names in the wild.''<br />
<br />
<pre><br />
<html lang="en"><br />
<head><br />
<title>HTML Extensibility Test</title><br />
</head><br />
<body><br />
<h1 id="test_of_extensibility">Test of Extensibility</h1><br />
<p>This is a test of an extensibility point in text/html, with a fallback mechanism.</p><br />
<ext><br />
<fallback><br />
<img src="anIsland.png" alt="..."/><br />
<style type='text/css'><br />
svg > * { display: none; }<br />
</style><br />
</fallback><br />
<svg xmlns="http://www.w3.org/2000/svg"<br />
xmlns:xlink="http://www.w3.org/1999/xlink"<br />
width="100%" height="100%"<br />
version="1.1"><br />
<title>My Title</title><br />
<desc>schepers, 01-04-2008</desc><br />
<circle id="circle_1" cx="75" cy="25" r="20" fill="lime" /><br />
<text id='text_1' x='10' y='25' font-size='18' fill='crimson'>This is some text.</text><br />
</svg> <br />
</ext><br />
</body><br />
</html><br />
</pre><br />
<br />
=== Reasons why we can't do this ===<br />
<br />
It's not clear what the processing model being proposed actually is. However, there is already one problem:<br />
<br />
* The idea relies on not conflicting with legacy content. Unfortunately, whatever syntax we end up using, people will copy and paste it from documents that were written by competent authors that tested it against the new UAs, into documents written by authors who don't know about this, and who don't have the new UA, thus creating new "legacy documents" that use whatever syntax we come up with. Saying the risk is minimal doesn't mitigate this problem. It's a real problem, and we have to deal with it. <font color="red">''I think this risk is minimal, since it clearly wouldn't work in the legacy UAs, and so the mistake will have less reason to propagate. -Shepazu''</font><br />
<br />
Note also that the fallback idea doesn't work. Elements like <script>, <style>, <title>, <input>, <textarea> etc, get treated as HTML elements in legacy UAs. Relying on CSS for hiding the text content doesn't work either, because CSS is optional and might not be enabled (or supported). (It doesn't much matter, though, because fallback isn't one of [[New Vocabularies|the things we're trying to address]] with this.)<br />
<br />
== Proposal 3: XML5 ==<br />
<br />
Microsoft has published a [http://code.msdn.microsoft.com/Release/ProjectReleases.aspx?ProjectName=ie8whitepapers&ReleaseId=573 whitepaper] on the subject of ''Improved Namespace Support''. Salient features:<br />
<br />
* Windows Internet Explorer 8 Beta 1 for Developers offers Web developers the opportunity to write standards-compliant HTML-based Web pages that support features (such as SVG, XUL, and MathML) in namespaces, ''provided that the client has installed appropriate handlers for those namespaces via binary behaviors''. (A binary behavior is a type of ActiveX control.)<br />
* Internet Explorer 8 does not support the XHTML namespace definition. Thus, default namespace declarations of XHTML are ignored (xmlns="http://www.w3.org/1999/xhtml").<br />
* Internet Explorer 8 does not support default namespace declarations on any known elements such as HTML, SCRIPT, DIV, or STYLE. If default namespace declarations are encountered on these elements, the declaration is ignored (for purposes of existing Web page compatibility).<br />
<br />
A few notes:<br />
<br />
* While Microsoft's IE8 implementation as described by this whitepaper does not satisfy all of the requirements; the above list focuses on the parts that do.<br />
* While Microsoft's implementation is based on ActiveX, the situation could very well end up being similar to [http://www.w3.org/TR/XMLHttpRequest/ XMLHttpRequest] whereby the functionality was first exposed via ActiveX, other browser vendors adopted an alternate object model interface to this same functionality, and that interface was later adopted and standardized.<br />
* While the white paper does not explicitly state this requirement, the approach works best if the simple name for the unknown (to HTML5) element which contains the default namespace declaration for which a binary behavior has been installed is not contained within the subtree. Both SVG and MathML have unique elements (<code>svg</code> and <code>math</code>, respectively) that satisfy this purpose. This gives proposal 3 some of the desirable characteristics of proposal 2 spelled out above.<br />
* In order to meet the ''Resistance to errors (e.g. not brittle in the face of syntax errors)'' [[New Vocabularies|requirement]], something akin to [http://annevankesteren.nl/2007/10/xml5 Anne van Kesteren's XML5] would be required, an implementation of which can be seen on [http://code.google.com/p/xml5/ Google Code].</div>Rubyshttps://wiki.whatwg.org/index.php?title=Sanitization_rules&diff=2659Sanitization rules2007-11-13T20:56:04Z<p>Rubys: /* Acceptable Elements */</p>
<hr />
<div>This page was initially seeded with the sanitization lists and rules implemented by the [http://code.google.com/p/html5lib/ html5lib] sanitizer, which in turn was based on [http://golem.ph.utexas.edu/instiki/show/HomePage Jacques Distler's branch of Instiki], which in turn was based on the sanitization logic in the [http://www.feedparser.org/ Universal Feed Parser].<br />
<br />
It is hoped that others will add, update, and extend this list based on their experiences in their own products, and furthermore that some will update their products based on these lists. One such product is [http://htmlpurifier.org/ HTMLPurifier] ([http://intertwingly.net/stories/2007/08/11/diffs diffs]). Another product is [http://www.bloglines.com/help/css-support bloglines].<br />
<br />
As a suggestion but not as a requirement: people who do update their products to reflect information from this list are encouraged to add a link to this page as a comment in the hopes that it will encourage subsequent maintainers to keep this page up to date.<br />
<br />
As a convenience, [http://intertwingly.net/stories/2007/08/13/sanitize_lists.cgi this script] ([http://intertwingly.net/stories/2007/08/13/sanitize_lists.rb source]) converts these lists into a syntax shared by a number of common programming languages.<br />
<br />
=== Acceptable Elements ===<br />
<br />
* a<br />
* abbr<br />
* acronym<br />
* address<br />
* area<br />
* b<br />
* bdo<br />
* big<br />
* blockquote<br />
* br<br />
* button<br />
* caption<br />
* center<br />
* cite<br />
* code<br />
* col<br />
* colgroup<br />
* dd<br />
* del<br />
* dfn<br />
* dir<br />
* div<br />
* dl<br />
* dt<br />
* em<br />
* fieldset<br />
* font<br />
* form<br />
* h1<br />
* h2<br />
* h3<br />
* h4<br />
* h5<br />
* h6<br />
* hr<br />
* i<br />
* img<br />
* input<br />
* ins<br />
* kbd<br />
* label<br />
* legend<br />
* li<br />
* map<br />
* menu<br />
* ol<br />
* optgroup<br />
* option<br />
* p<br />
* pre<br />
* q<br />
* s<br />
* samp<br />
* select<br />
* small<br />
* span<br />
* strike<br />
* strong<br />
* sub<br />
* sup<br />
* table<br />
* tbody<br />
* td<br />
* textarea<br />
* tfoot<br />
* th<br />
* thead<br />
* tr<br />
* tt<br />
* u<br />
* ul<br />
* var<br />
* wbr<br />
<br />
==== mathml Elements ====<br />
<br />
* maction<br />
* math<br />
* merror<br />
* mfrac<br />
* mi<br />
* mmultiscripts<br />
* mn<br />
* mo<br />
* mover<br />
* mpadded<br />
* mphantom<br />
* mprescripts<br />
* mroot<br />
* mrow<br />
* mspace<br />
* msqrt<br />
* mstyle<br />
* msub<br />
* msubsup<br />
* msup<br />
* mtable<br />
* mtd<br />
* mtext<br />
* mtr<br />
* munder<br />
* munderover<br />
* none<br />
<br />
==== svg Elements ====<br />
<br />
* a<br />
* animate<br />
* animateColor<br />
* animateMotion<br />
* animateTransform<br />
* circle<br />
* defs<br />
* desc<br />
* ellipse<br />
* font-face<br />
* font-face-name<br />
* font-face-src<br />
* g<br />
* glyph<br />
* hkern<br />
* image<br />
* linearGradient<br />
* line<br />
* marker<br />
* metadata<br />
* missing-glyph<br />
* mpath<br />
* path<br />
* polygon<br />
* polyline<br />
* radialGradient<br />
* rect<br />
* set<br />
* stop<br />
* svg<br />
* switch<br />
* text<br />
* title<br />
* tspan<br />
* use<br />
<br />
=== Acceptable Attributes ===<br />
<br />
* abbr<br />
* accept<br />
* accept-charset<br />
* accesskey<br />
* action<br />
* align<br />
* alt<br />
* axis<br />
* border<br />
* cellpadding<br />
* cellspacing<br />
* char<br />
* charoff<br />
* charset<br />
* checked<br />
* cite<br />
* class<br />
* clear<br />
* cols<br />
* colspan<br />
* color<br />
* compact<br />
* coords<br />
* datetime<br />
* dir<br />
* disabled<br />
* enctype<br />
* for<br />
* frame<br />
* headers<br />
* height<br />
* href<br />
* hreflang<br />
* hspace<br />
* id<br />
* ismap<br />
* label<br />
* lang<br />
* longdesc<br />
* maxlength<br />
* media<br />
* method<br />
* multiple<br />
* name<br />
* nohref<br />
* noshade<br />
* nowrap<br />
* prompt<br />
* readonly<br />
* rel<br />
* rev<br />
* rows<br />
* rowspan<br />
* rules<br />
* scope<br />
* selected<br />
* shape<br />
* size<br />
* span<br />
* src<br />
* start<br />
* style<br />
* summary<br />
* tabindex<br />
* target<br />
* title<br />
* type<br />
* usemap<br />
* valign<br />
* value<br />
* vspace<br />
* width<br />
* xml:lang<br />
<br />
==== mathml Attributes ====<br />
<br />
* actiontype<br />
* align<br />
* columnalign<br />
* columnalign<br />
* columnalign<br />
* columnlines<br />
* columnspacing<br />
* columnspan<br />
* depth<br />
* display<br />
* displaystyle<br />
* equalcolumns<br />
* equalrows<br />
* fence<br />
* fontstyle<br />
* fontweight<br />
* frame<br />
* height<br />
* linethickness<br />
* lspace<br />
* mathbackground<br />
* mathcolor<br />
* mathvariant<br />
* mathvariant<br />
* maxsize<br />
* minsize<br />
* other<br />
* rowalign<br />
* rowalign<br />
* rowalign<br />
* rowlines<br />
* rowspacing<br />
* rowspan<br />
* rspace<br />
* scriptlevel<br />
* selection<br />
* separator<br />
* stretchy<br />
* width<br />
* width<br />
* xlink:href<br />
* xlink:show<br />
* xlink:type<br />
* xmlns<br />
* xmlns:xlink<br />
<br />
==== svg Attributes ====<br />
<br />
* accent-height<br />
* accumulate<br />
* additive<br />
* alphabetic<br />
* arabic-form<br />
* ascent<br />
* attributeName<br />
* attributeType<br />
* baseProfile<br />
* bbox<br />
* begin<br />
* by<br />
* calcMode<br />
* cap-height<br />
* class<br />
* color<br />
* color-rendering<br />
* content<br />
* cx<br />
* cy<br />
* d<br />
* dx<br />
* dy<br />
* descent<br />
* display<br />
* dur<br />
* end<br />
* fill<br />
* fill-rule<br />
* font-family<br />
* font-size<br />
* font-stretch<br />
* font-style<br />
* font-variant<br />
* font-weight<br />
* from<br />
* fx<br />
* fy<br />
* g1<br />
* g2<br />
* glyph-name<br />
* gradientUnits<br />
* hanging<br />
* height<br />
* horiz-adv-x<br />
* horiz-origin-x<br />
* id<br />
* ideographic<br />
* k<br />
* keyPoints<br />
* keySplines<br />
* keyTimes<br />
* lang<br />
* marker-end<br />
* marker-mid<br />
* marker-start<br />
* markerHeight<br />
* markerUnits<br />
* markerWidth<br />
* mathematical<br />
* max<br />
* min<br />
* name<br />
* offset<br />
* opacity<br />
* orient<br />
* origin<br />
* overline-position<br />
* overline-thickness<br />
* panose-1<br />
* path<br />
* pathLength<br />
* points<br />
* preserveAspectRatio<br />
* r<br />
* refX<br />
* refY<br />
* repeatCount<br />
* repeatDur<br />
* requiredExtensions<br />
* requiredFeatures<br />
* restart<br />
* rotate<br />
* rx<br />
* ry<br />
* slope<br />
* stemh<br />
* stemv<br />
* stop-color<br />
* stop-opacity<br />
* strikethrough-position<br />
* strikethrough-thickness<br />
* stroke<br />
* stroke-dasharray<br />
* stroke-dashoffset<br />
* stroke-linecap<br />
* stroke-linejoin<br />
* stroke-miterlimit<br />
* stroke-opacity<br />
* stroke-width<br />
* systemLanguage<br />
* target<br />
* text-anchor<br />
* to<br />
* transform<br />
* type<br />
* u1<br />
* u2<br />
* underline-position<br />
* underline-thickness<br />
* unicode<br />
* unicode-range<br />
* units-per-em<br />
* values<br />
* version<br />
* viewBox<br />
* visibility<br />
* width<br />
* widths<br />
* x<br />
* x-height<br />
* x1<br />
* x2<br />
* xlink:actuate<br />
* xlink:arcrole<br />
* xlink:href<br />
* xlink:role<br />
* xlink:show<br />
* xlink:title<br />
* xlink:type<br />
* xml:base<br />
* xml:lang<br />
* xml:space<br />
* xmlns<br />
* xmlns:xlink<br />
* y<br />
* y1<br />
* y2<br />
* zoomAndPan<br />
<br />
=== CSS Rules ===<br />
<br />
First <code>urls</code> matching the following regular expression are removed:<br />
<pre>url\s*\(\s*[^\s)]+?\s*\)\s*</pre><br />
<br />
The style strings that don't match the following are deemed obfuscated, and ignored entirely:<br />
<pre>^([:,;#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'|"[\s\w]+"|\([\d,\s]+\))*$</pre><br />
<pre>^(\s*[-\w]+\s*:\s*[^:;]*(;|$))*$</pre><br />
<br />
==== style Properties ====<br />
<br />
* azimuth<br />
* background, background-*<br />
* border, border-*<br />
* clear<br />
* color<br />
* cursor<br />
* direction<br />
* display<br />
* elevation<br />
* float<br />
* font<br />
* font-family<br />
* font-size<br />
* font-style<br />
* font-variant<br />
* font-weight<br />
* height<br />
* letter-spacing<br />
* line-height<br />
* margin, margin-*<br />
* overflow<br />
* padding, padding-*<br />
* pause<br />
* pause-after<br />
* pause-before<br />
* pitch<br />
* pitch-range<br />
* richness<br />
* speak<br />
* speak-header<br />
* speak-numeral<br />
* speak-punctuation<br />
* speech-rate<br />
* stress<br />
* text-align<br />
* text-decoration<br />
* text-indent<br />
* unicode-bidi<br />
* vertical-align<br />
* voice-family<br />
* volume<br />
* white-space<br />
* width<br />
<br />
==== style Property Values ====<br />
<br />
* auto<br />
* aqua<br />
* black<br />
* block<br />
* blue<br />
* bold<br />
* both<br />
* bottom<br />
* brown<br />
* center<br />
* collapse<br />
* dashed<br />
* dotted<br />
* fuchsia<br />
* gray<br />
* green<br />
* !important<br />
* italic<br />
* left<br />
* lime<br />
* maroon<br />
* medium<br />
* none<br />
* navy<br />
* normal<br />
* nowrap<br />
* olive<br />
* pointer<br />
* purple<br />
* red<br />
* right<br />
* solid<br />
* silver<br />
* teal<br />
* top<br />
* transparent<br />
* underline<br />
* white<br />
* yellow<br />
<br />
In addition, values that match the following regular expression are valid:<br />
<br />
<code>^(#[0-9a-f]+|rgb\(\d+%?,\d*%?,?\d*%?\)?|\d{0,2}\.?\d{0,2}(cm|em|ex|in|mm|pc|pt|px|%|,|\))?)$</code><br />
<br />
==== svg style Properties ====<br />
<br />
* fill<br />
* fill-opacity<br />
* fill-rule<br />
* stroke<br />
* stroke-width<br />
* stroke-linecap<br />
* stroke-linejoin<br />
* stroke-opacity<br />
<br />
=== URIs ===<br />
==== Attributes whose value is a URI ====<br />
<br />
* href<br />
* src<br />
* cite<br />
* action<br />
* longdesc<br />
* xlink:href<br />
* xml:base<br />
<br />
==== URI protocols ====<br />
<br />
* afs<br />
* aim<br />
* callto<br />
* data (see [[#Safe data URL content types]])<br />
* ed2k<br />
* feed<br />
* ftp<br />
* gopher<br />
* http<br />
* https<br />
* irc<br />
* mailto<br />
* news<br />
* nntp<br />
* rsync<br />
* rtsp<br />
* sftp<br />
* ssh<br />
* tag<br />
* tel<br />
* telnet<br />
* urn<br />
* webcal<br />
* wtai<br />
* xmpp<br />
<br />
==== Safe data URL content types ====<br />
Note: This section is being [http://wiki.whatwg.org/wiki/Talk:Sanitization_rules discussed].<br />
* text/plain<br />
* image/gif<br />
* image/jpg<br />
* image/png</div>Rubyshttps://wiki.whatwg.org/index.php?title=Sanitization_rules&diff=2609Sanitization rules2007-09-18T23:53:21Z<p>Rubys: add bloglines link</p>
<hr />
<div>This page was initially seeded with the sanitization lists and rules implemented by the [http://code.google.com/p/html5lib/ html5lib] sanitizer, which in turn was based on [http://golem.ph.utexas.edu/instiki/show/HomePage Jacques Distler's branch of Instiki], which in turn was based on the sanitization logic in the [http://www.feedparser.org/ Universal Feed Parser].<br />
<br />
It is hoped that others will add, update, and extend this list based on their experiences in their own products, and furthermore that some will update their products based on these lists. One such product is [http://htmlpurifier.org/ HTMLPurifier] ([http://intertwingly.net/stories/2007/08/11/diffs diffs]). Another product is [http://www.bloglines.com/help/css-support bloglines].<br />
<br />
As a suggestion but not as a requirement: people who do update their products to reflect information from this list are encouraged to add a link to this page as a comment in the hopes that it will encourage subsequent maintainers to keep this page up to date.<br />
<br />
As a convenience, [http://intertwingly.net/stories/2007/08/13/sanitize_lists.cgi this script] ([http://intertwingly.net/stories/2007/08/13/sanitize_lists.rb source]) converts these lists into a syntax shared by a number of common programming languages.<br />
<br />
=== Acceptable Elements ===<br />
<br />
* a<br />
* abbr<br />
* acronym<br />
* address<br />
* area<br />
* b<br />
* bdo<br />
* big<br />
* blockquote<br />
* br<br />
* button<br />
* caption<br />
* center<br />
* cite<br />
* code<br />
* col<br />
* colgroup<br />
* dd<br />
* del<br />
* dfn<br />
* dir<br />
* div<br />
* dl<br />
* dt<br />
* em<br />
* fieldset<br />
* font<br />
* form<br />
* h1<br />
* h2<br />
* h3<br />
* h4<br />
* h5<br />
* h6<br />
* hr<br />
* i<br />
* img<br />
* input<br />
* ins<br />
* kbd<br />
* label<br />
* legend<br />
* li<br />
* map<br />
* menu<br />
* ol<br />
* optgroup<br />
* option<br />
* p<br />
* pre<br />
* q<br />
* s<br />
* samp<br />
* select<br />
* small<br />
* span<br />
* strike<br />
* strong<br />
* sub<br />
* sup<br />
* table<br />
* tbody<br />
* td<br />
* textarea<br />
* tfoot<br />
* th<br />
* thead<br />
* tr<br />
* tt<br />
* u<br />
* ul<br />
* var<br />
<br />
==== mathml Elements ====<br />
<br />
* maction<br />
* math<br />
* merror<br />
* mfrac<br />
* mi<br />
* mmultiscripts<br />
* mn<br />
* mo<br />
* mover<br />
* mpadded<br />
* mphantom<br />
* mprescripts<br />
* mroot<br />
* mrow<br />
* mspace<br />
* msqrt<br />
* mstyle<br />
* msub<br />
* msubsup<br />
* msup<br />
* mtable<br />
* mtd<br />
* mtext<br />
* mtr<br />
* munder<br />
* munderover<br />
* none<br />
<br />
==== svg Elements ====<br />
<br />
* a<br />
* animate<br />
* animateColor<br />
* animateMotion<br />
* animateTransform<br />
* circle<br />
* defs<br />
* desc<br />
* ellipse<br />
* font-face<br />
* font-face-name<br />
* font-face-src<br />
* g<br />
* glyph<br />
* hkern<br />
* image<br />
* linearGradient<br />
* line<br />
* marker<br />
* metadata<br />
* missing-glyph<br />
* mpath<br />
* path<br />
* polygon<br />
* polyline<br />
* radialGradient<br />
* rect<br />
* set<br />
* stop<br />
* svg<br />
* switch<br />
* text<br />
* title<br />
* tspan<br />
* use<br />
<br />
=== Acceptable Attributes ===<br />
<br />
* abbr<br />
* accept<br />
* accept-charset<br />
* accesskey<br />
* action<br />
* align<br />
* alt<br />
* axis<br />
* border<br />
* cellpadding<br />
* cellspacing<br />
* char<br />
* charoff<br />
* charset<br />
* checked<br />
* cite<br />
* class<br />
* clear<br />
* cols<br />
* colspan<br />
* color<br />
* compact<br />
* coords<br />
* datetime<br />
* dir<br />
* disabled<br />
* enctype<br />
* for<br />
* frame<br />
* headers<br />
* height<br />
* href<br />
* hreflang<br />
* hspace<br />
* id<br />
* ismap<br />
* label<br />
* lang<br />
* longdesc<br />
* maxlength<br />
* media<br />
* method<br />
* multiple<br />
* name<br />
* nohref<br />
* noshade<br />
* nowrap<br />
* prompt<br />
* readonly<br />
* rel<br />
* rev<br />
* rows<br />
* rowspan<br />
* rules<br />
* scope<br />
* selected<br />
* shape<br />
* size<br />
* span<br />
* src<br />
* start<br />
* style<br />
* summary<br />
* tabindex<br />
* target<br />
* title<br />
* type<br />
* usemap<br />
* valign<br />
* value<br />
* vspace<br />
* width<br />
* xml:lang<br />
<br />
==== mathml Attributes ====<br />
<br />
* actiontype<br />
* align<br />
* columnalign<br />
* columnalign<br />
* columnalign<br />
* columnlines<br />
* columnspacing<br />
* columnspan<br />
* depth<br />
* display<br />
* displaystyle<br />
* equalcolumns<br />
* equalrows<br />
* fence<br />
* fontstyle<br />
* fontweight<br />
* frame<br />
* height<br />
* linethickness<br />
* lspace<br />
* mathbackground<br />
* mathcolor<br />
* mathvariant<br />
* mathvariant<br />
* maxsize<br />
* minsize<br />
* other<br />
* rowalign<br />
* rowalign<br />
* rowalign<br />
* rowlines<br />
* rowspacing<br />
* rowspan<br />
* rspace<br />
* scriptlevel<br />
* selection<br />
* separator<br />
* stretchy<br />
* width<br />
* width<br />
* xlink:href<br />
* xlink:show<br />
* xlink:type<br />
* xmlns<br />
* xmlns:xlink<br />
<br />
==== svg Attributes ====<br />
<br />
* accent-height<br />
* accumulate<br />
* additive<br />
* alphabetic<br />
* arabic-form<br />
* ascent<br />
* attributeName<br />
* attributeType<br />
* baseProfile<br />
* bbox<br />
* begin<br />
* by<br />
* calcMode<br />
* cap-height<br />
* class<br />
* color<br />
* color-rendering<br />
* content<br />
* cx<br />
* cy<br />
* d<br />
* dx<br />
* dy<br />
* descent<br />
* display<br />
* dur<br />
* end<br />
* fill<br />
* fill-rule<br />
* font-family<br />
* font-size<br />
* font-stretch<br />
* font-style<br />
* font-variant<br />
* font-weight<br />
* from<br />
* fx<br />
* fy<br />
* g1<br />
* g2<br />
* glyph-name<br />
* gradientUnits<br />
* hanging<br />
* height<br />
* horiz-adv-x<br />
* horiz-origin-x<br />
* id<br />
* ideographic<br />
* k<br />
* keyPoints<br />
* keySplines<br />
* keyTimes<br />
* lang<br />
* marker-end<br />
* marker-mid<br />
* marker-start<br />
* markerHeight<br />
* markerUnits<br />
* markerWidth<br />
* mathematical<br />
* max<br />
* min<br />
* name<br />
* offset<br />
* opacity<br />
* orient<br />
* origin<br />
* overline-position<br />
* overline-thickness<br />
* panose-1<br />
* path<br />
* pathLength<br />
* points<br />
* preserveAspectRatio<br />
* r<br />
* refX<br />
* refY<br />
* repeatCount<br />
* repeatDur<br />
* requiredExtensions<br />
* requiredFeatures<br />
* restart<br />
* rotate<br />
* rx<br />
* ry<br />
* slope<br />
* stemh<br />
* stemv<br />
* stop-color<br />
* stop-opacity<br />
* strikethrough-position<br />
* strikethrough-thickness<br />
* stroke<br />
* stroke-dasharray<br />
* stroke-dashoffset<br />
* stroke-linecap<br />
* stroke-linejoin<br />
* stroke-miterlimit<br />
* stroke-opacity<br />
* stroke-width<br />
* systemLanguage<br />
* target<br />
* text-anchor<br />
* to<br />
* transform<br />
* type<br />
* u1<br />
* u2<br />
* underline-position<br />
* underline-thickness<br />
* unicode<br />
* unicode-range<br />
* units-per-em<br />
* values<br />
* version<br />
* viewBox<br />
* visibility<br />
* width<br />
* widths<br />
* x<br />
* x-height<br />
* x1<br />
* x2<br />
* xlink:actuate<br />
* xlink:arcrole<br />
* xlink:href<br />
* xlink:role<br />
* xlink:show<br />
* xlink:title<br />
* xlink:type<br />
* xml:base<br />
* xml:lang<br />
* xml:space<br />
* xmlns<br />
* xmlns:xlink<br />
* y<br />
* y1<br />
* y2<br />
* zoomAndPan<br />
<br />
=== CSS Rules ===<br />
<br />
First <code>urls</code> matching the following regular expression are removed:<br />
<pre>url\s*\(\s*[^\s)]+?\s*\)\s*</pre><br />
<br />
The style strings that don't match the following are deemed obfuscated, and ignored entirely:<br />
<pre>^([:,;#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'|"[\s\w]+"|\([\d,\s]+\))*$</pre><br />
<pre>^(\s*[-\w]+\s*:\s*[^:;]*(;|$))*$</pre><br />
<br />
==== style Properties ====<br />
<br />
* azimuth<br />
* background, background-*<br />
* border, border-*<br />
* clear<br />
* color<br />
* cursor<br />
* direction<br />
* display<br />
* elevation<br />
* float<br />
* font<br />
* font-family<br />
* font-size<br />
* font-style<br />
* font-variant<br />
* font-weight<br />
* height<br />
* letter-spacing<br />
* line-height<br />
* margin, margin-*<br />
* overflow<br />
* padding, padding-*<br />
* pause<br />
* pause-after<br />
* pause-before<br />
* pitch<br />
* pitch-range<br />
* richness<br />
* speak<br />
* speak-header<br />
* speak-numeral<br />
* speak-punctuation<br />
* speech-rate<br />
* stress<br />
* text-align<br />
* text-decoration<br />
* text-indent<br />
* unicode-bidi<br />
* vertical-align<br />
* voice-family<br />
* volume<br />
* white-space<br />
* width<br />
<br />
==== style Property Values ====<br />
<br />
* auto<br />
* aqua<br />
* black<br />
* block<br />
* blue<br />
* bold<br />
* both<br />
* bottom<br />
* brown<br />
* center<br />
* collapse<br />
* dashed<br />
* dotted<br />
* fuchsia<br />
* gray<br />
* green<br />
* !important<br />
* italic<br />
* left<br />
* lime<br />
* maroon<br />
* medium<br />
* none<br />
* navy<br />
* normal<br />
* nowrap<br />
* olive<br />
* pointer<br />
* purple<br />
* red<br />
* right<br />
* solid<br />
* silver<br />
* teal<br />
* top<br />
* transparent<br />
* underline<br />
* white<br />
* yellow<br />
<br />
In addition, values that match the following regular expression are valid:<br />
<br />
<code>^(#[0-9a-f]+|rgb\(\d+%?,\d*%?,?\d*%?\)?|\d{0,2}\.?\d{0,2}(cm|em|ex|in|mm|pc|pt|px|%|,|\))?)$</code><br />
<br />
==== svg style Properties ====<br />
<br />
* fill<br />
* fill-opacity<br />
* fill-rule<br />
* stroke<br />
* stroke-width<br />
* stroke-linecap<br />
* stroke-linejoin<br />
* stroke-opacity<br />
<br />
=== URIs ===<br />
==== Attributes whose value is a URI ====<br />
<br />
* href<br />
* src<br />
* cite<br />
* action<br />
* longdesc<br />
* xlink:href<br />
* xml:base<br />
<br />
==== URI protocols ====<br />
<br />
* afs<br />
* aim<br />
* callto<br />
* data (see [[#Safe data URL content types]])<br />
* ed2k<br />
* feed<br />
* ftp<br />
* gopher<br />
* http<br />
* https<br />
* irc<br />
* mailto<br />
* news<br />
* nntp<br />
* rsync<br />
* rtsp<br />
* sftp<br />
* ssh<br />
* tag<br />
* tel<br />
* telnet<br />
* urn<br />
* webcal<br />
* wtai<br />
* xmpp<br />
<br />
==== Safe data URL content types ====<br />
Note: This section is being [http://wiki.whatwg.org/wiki/Talk:Sanitization_rules discussed].<br />
* text/plain<br />
* image/gif<br />
* image/jpg<br />
* image/png</div>Rubyshttps://wiki.whatwg.org/index.php?title=Sanitization_rules&diff=2427Sanitization rules2007-08-13T13:22:46Z<p>Rubys: link to script that converts the output to programming syntax</p>
<hr />
<div>This page was initially seeded with the sanitization lists and rules implemented by the [http://code.google.com/p/html5lib/ html5lib] sanitizer, which in turn was based on [http://golem.ph.utexas.edu/instiki/show/HomePage Jacques Distler's branch of Instiki], which in turn was based on the sanitization logic in the [http://www.feedparser.org/ Universal Feed Parser].<br />
<br />
It is hoped that others will add, update, and extend this list based on their experiences in their own products, and furthermore that some will update their products based on these lists. One such product is [http://htmlpurifier.org/ HTMLPurifier] ([http://intertwingly.net/stories/2007/08/11/diffs diffs]).<br />
<br />
As a suggestion but not as a requirement: people who do update their products to reflect information from this list are encouraged to add a link to this page as a comment in the hopes that it will encourage subsequent maintainers to keep this page up to date.<br />
<br />
As a convenience, [http://intertwingly.net/stories/2007/08/13/sanitize_lists.cgi this script] ([http://intertwingly.net/stories/2007/08/13/sanitize_lists.rb source]) converts these lists into a syntax shared by a number of common programming languages.<br />
<br />
=== Acceptable Elements ===<br />
<br />
* a<br />
* abbr<br />
* acronym<br />
* address<br />
* area<br />
* b<br />
* bdo<br />
* big<br />
* blockquote<br />
* br<br />
* button<br />
* caption<br />
* center<br />
* cite<br />
* code<br />
* col<br />
* colgroup<br />
* dd<br />
* del<br />
* dfn<br />
* dir<br />
* div<br />
* dl<br />
* dt<br />
* em<br />
* fieldset<br />
* font<br />
* form<br />
* h1<br />
* h2<br />
* h3<br />
* h4<br />
* h5<br />
* h6<br />
* hr<br />
* i<br />
* img<br />
* input<br />
* ins<br />
* kbd<br />
* label<br />
* legend<br />
* li<br />
* map<br />
* menu<br />
* ol<br />
* optgroup<br />
* option<br />
* p<br />
* pre<br />
* q<br />
* s<br />
* samp<br />
* select<br />
* small<br />
* span<br />
* strike<br />
* strong<br />
* sub<br />
* sup<br />
* table<br />
* tbody<br />
* td<br />
* textarea<br />
* tfoot<br />
* th<br />
* thead<br />
* tr<br />
* tt<br />
* u<br />
* ul<br />
* var<br />
<br />
==== mathml Elements ====<br />
<br />
* maction<br />
* math<br />
* merror<br />
* mfrac<br />
* mi<br />
* mmultiscripts<br />
* mn<br />
* mo<br />
* mover<br />
* mpadded<br />
* mphantom<br />
* mprescripts<br />
* mroot<br />
* mrow<br />
* mspace<br />
* msqrt<br />
* mstyle<br />
* msub<br />
* msubsup<br />
* msup<br />
* mtable<br />
* mtd<br />
* mtext<br />
* mtr<br />
* munder<br />
* munderover<br />
* none<br />
<br />
==== svg Elements ====<br />
<br />
* a<br />
* animate<br />
* animateColor<br />
* animateMotion<br />
* animateTransform<br />
* circle<br />
* defs<br />
* desc<br />
* ellipse<br />
* font-face<br />
* font-face-name<br />
* font-face-src<br />
* g<br />
* glyph<br />
* hkern<br />
* image<br />
* linearGradient<br />
* line<br />
* marker<br />
* metadata<br />
* missing-glyph<br />
* mpath<br />
* path<br />
* polygon<br />
* polyline<br />
* radialGradient<br />
* rect<br />
* set<br />
* stop<br />
* svg<br />
* switch<br />
* text<br />
* title<br />
* tspan<br />
* use<br />
<br />
=== Acceptable Attributes ===<br />
<br />
* abbr<br />
* accept<br />
* accept-charset<br />
* accesskey<br />
* action<br />
* align<br />
* alt<br />
* axis<br />
* border<br />
* cellpadding<br />
* cellspacing<br />
* char<br />
* charoff<br />
* charset<br />
* checked<br />
* cite<br />
* class<br />
* clear<br />
* cols<br />
* colspan<br />
* color<br />
* compact<br />
* coords<br />
* datetime<br />
* dir<br />
* disabled<br />
* enctype<br />
* for<br />
* frame<br />
* headers<br />
* height<br />
* href<br />
* hreflang<br />
* hspace<br />
* id<br />
* ismap<br />
* label<br />
* lang<br />
* longdesc<br />
* maxlength<br />
* media<br />
* method<br />
* multiple<br />
* name<br />
* nohref<br />
* noshade<br />
* nowrap<br />
* prompt<br />
* readonly<br />
* rel<br />
* rev<br />
* rows<br />
* rowspan<br />
* rules<br />
* scope<br />
* selected<br />
* shape<br />
* size<br />
* span<br />
* src<br />
* start<br />
* style<br />
* summary<br />
* tabindex<br />
* target<br />
* title<br />
* type<br />
* usemap<br />
* valign<br />
* value<br />
* vspace<br />
* width<br />
* xml:lang<br />
<br />
==== mathml Attributes ====<br />
<br />
* actiontype<br />
* align<br />
* columnalign<br />
* columnalign<br />
* columnalign<br />
* columnlines<br />
* columnspacing<br />
* columnspan<br />
* depth<br />
* display<br />
* displaystyle<br />
* equalcolumns<br />
* equalrows<br />
* fence<br />
* fontstyle<br />
* fontweight<br />
* frame<br />
* height<br />
* linethickness<br />
* lspace<br />
* mathbackground<br />
* mathcolor<br />
* mathvariant<br />
* mathvariant<br />
* maxsize<br />
* minsize<br />
* other<br />
* rowalign<br />
* rowalign<br />
* rowalign<br />
* rowlines<br />
* rowspacing<br />
* rowspan<br />
* rspace<br />
* scriptlevel<br />
* selection<br />
* separator<br />
* stretchy<br />
* width<br />
* width<br />
* xlink:href<br />
* xlink:show<br />
* xlink:type<br />
* xmlns<br />
* xmlns:xlink<br />
<br />
==== svg Attributes ====<br />
<br />
* accent-height<br />
* accumulate<br />
* additive<br />
* alphabetic<br />
* arabic-form<br />
* ascent<br />
* attributeName<br />
* attributeType<br />
* baseProfile<br />
* bbox<br />
* begin<br />
* by<br />
* calcMode<br />
* cap-height<br />
* class<br />
* color<br />
* color-rendering<br />
* content<br />
* cx<br />
* cy<br />
* d<br />
* dx<br />
* dy<br />
* descent<br />
* display<br />
* dur<br />
* end<br />
* fill<br />
* fill-rule<br />
* font-family<br />
* font-size<br />
* font-stretch<br />
* font-style<br />
* font-variant<br />
* font-weight<br />
* from<br />
* fx<br />
* fy<br />
* g1<br />
* g2<br />
* glyph-name<br />
* gradientUnits<br />
* hanging<br />
* height<br />
* horiz-adv-x<br />
* horiz-origin-x<br />
* id<br />
* ideographic<br />
* k<br />
* keyPoints<br />
* keySplines<br />
* keyTimes<br />
* lang<br />
* marker-end<br />
* marker-mid<br />
* marker-start<br />
* markerHeight<br />
* markerUnits<br />
* markerWidth<br />
* mathematical<br />
* max<br />
* min<br />
* name<br />
* offset<br />
* opacity<br />
* orient<br />
* origin<br />
* overline-position<br />
* overline-thickness<br />
* panose-1<br />
* path<br />
* pathLength<br />
* points<br />
* preserveAspectRatio<br />
* r<br />
* refX<br />
* refY<br />
* repeatCount<br />
* repeatDur<br />
* requiredExtensions<br />
* requiredFeatures<br />
* restart<br />
* rotate<br />
* rx<br />
* ry<br />
* slope<br />
* stemh<br />
* stemv<br />
* stop-color<br />
* stop-opacity<br />
* strikethrough-position<br />
* strikethrough-thickness<br />
* stroke<br />
* stroke-dasharray<br />
* stroke-dashoffset<br />
* stroke-linecap<br />
* stroke-linejoin<br />
* stroke-miterlimit<br />
* stroke-opacity<br />
* stroke-width<br />
* systemLanguage<br />
* target<br />
* text-anchor<br />
* to<br />
* transform<br />
* type<br />
* u1<br />
* u2<br />
* underline-position<br />
* underline-thickness<br />
* unicode<br />
* unicode-range<br />
* units-per-em<br />
* values<br />
* version<br />
* viewBox<br />
* visibility<br />
* width<br />
* widths<br />
* x<br />
* x-height<br />
* x1<br />
* x2<br />
* xlink:actuate<br />
* xlink:arcrole<br />
* xlink:href<br />
* xlink:role<br />
* xlink:show<br />
* xlink:title<br />
* xlink:type<br />
* xml:base<br />
* xml:lang<br />
* xml:space<br />
* xmlns<br />
* xmlns:xlink<br />
* y<br />
* y1<br />
* y2<br />
* zoomAndPan<br />
<br />
=== CSS Rules ===<br />
<br />
First <code>urls</code> matching the following regular expression are removed:<br />
<pre>url\s*\(\s*[^\s)]+?\s*\)\s*</pre><br />
<br />
The style strings that don't match the following are deemed obfuscated, and ignored entirely:<br />
<pre>^([:,;#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'|"[\s\w]+"|\([\d,\s]+\))*$</pre><br />
<pre>^(\s*[-\w]+\s*:\s*[^:;]*(;|$))*$</pre><br />
<br />
==== style Properties ====<br />
<br />
* azimuth<br />
* background, background-*<br />
* border, border-*<br />
* clear<br />
* color<br />
* cursor<br />
* direction<br />
* display<br />
* elevation<br />
* float<br />
* font<br />
* font-family<br />
* font-size<br />
* font-style<br />
* font-variant<br />
* font-weight<br />
* height<br />
* letter-spacing<br />
* line-height<br />
* margin, margin-*<br />
* overflow<br />
* padding, padding-*<br />
* pause<br />
* pause-after<br />
* pause-before<br />
* pitch<br />
* pitch-range<br />
* richness<br />
* speak<br />
* speak-header<br />
* speak-numeral<br />
* speak-punctuation<br />
* speech-rate<br />
* stress<br />
* text-align<br />
* text-decoration<br />
* text-indent<br />
* unicode-bidi<br />
* vertical-align<br />
* voice-family<br />
* volume<br />
* white-space<br />
* width<br />
<br />
==== style Property Values ====<br />
<br />
* auto<br />
* aqua<br />
* black<br />
* block<br />
* blue<br />
* bold<br />
* both<br />
* bottom<br />
* brown<br />
* center<br />
* collapse<br />
* dashed<br />
* dotted<br />
* fuchsia<br />
* gray<br />
* green<br />
* !important<br />
* italic<br />
* left<br />
* lime<br />
* maroon<br />
* medium<br />
* none<br />
* navy<br />
* normal<br />
* nowrap<br />
* olive<br />
* pointer<br />
* purple<br />
* red<br />
* right<br />
* solid<br />
* silver<br />
* teal<br />
* top<br />
* transparent<br />
* underline<br />
* white<br />
* yellow<br />
<br />
In addition, values that match the following regular expression are valid:<br />
<br />
<code>^(#[0-9a-f]+|rgb\(\d+%?,\d*%?,?\d*%?\)?|\d{0,2}\.?\d{0,2}(cm|em|ex|in|mm|pc|pt|px|%|,|\))?)$</code><br />
<br />
==== svg sytle Properties ====<br />
<br />
* fill<br />
* fill-opacity<br />
* fill-rule<br />
* stroke<br />
* stroke-width<br />
* stroke-linecap<br />
* stroke-linejoin<br />
* stroke-opacity<br />
<br />
=== URIs ===<br />
==== Attributes whose value is a URI ====<br />
<br />
* href<br />
* src<br />
* cite<br />
* action<br />
* longdesc<br />
* xlink:href<br />
* xml:base<br />
<br />
==== URI protocols ====<br />
<br />
* afs<br />
* aim<br />
* callto<br />
* data (see [[#Safe data URL content types]])<br />
* ed2k<br />
* feed<br />
* ftp<br />
* gopher<br />
* http<br />
* https<br />
* irc<br />
* mailto<br />
* news<br />
* nntp<br />
* rsync<br />
* rtsp<br />
* sftp<br />
* ssh<br />
* tag<br />
* tel<br />
* telnet<br />
* urn<br />
* webcal<br />
* wtai<br />
* xmpp<br />
<br />
==== Safe data URL content types ====<br />
Note: This section is being [http://wiki.whatwg.org/wiki/Talk:Sanitization_rules discussed].<br />
* text/plain<br />
* image/gif<br />
* image/jpg<br />
* image/png</div>Rubyshttps://wiki.whatwg.org/index.php?title=Sanitization_rules&diff=2411Sanitization rules2007-08-11T09:23:33Z<p>Rubys: Add /(background|border|margin|padding)(-.*)?/; links</p>
<hr />
<div>This page was initially seeded with the sanitization lists and rules implemented by the [http://code.google.com/p/html5lib/ html5lib] sanitizer, which in turn was based on [http://golem.ph.utexas.edu/instiki/show/HomePage Jacques Distler's branch of Instiki], which in turn was based on the sanitization logic in the [http://www.feedparser.org/ Universal Feed Parser].<br />
<br />
It is hoped that others will add, update, and extend this list based on their experiences in their own products, and furthermore that some will update their products based on these lists. One such product is [http://htmlpurifier.org/ HTMLPurifier] ([http://intertwingly.net/stories/2007/08/11/diffs diffs]).<br />
<br />
As a suggestion but not as a requirement: people who do update their products to reflect information from this list are encouraged to add a link to this page as a comment in the hopes that it will encourage subsequent maintainers to keep this page up to date.<br />
<br />
=== Acceptable Elements ===<br />
<br />
* a<br />
* abbr<br />
* acronym<br />
* address<br />
* area<br />
* b<br />
* bdo<br />
* big<br />
* blockquote<br />
* br<br />
* button<br />
* caption<br />
* center<br />
* cite<br />
* code<br />
* col<br />
* colgroup<br />
* dd<br />
* del<br />
* dfn<br />
* dir<br />
* div<br />
* dl<br />
* dt<br />
* em<br />
* fieldset<br />
* font<br />
* form<br />
* h1<br />
* h2<br />
* h3<br />
* h4<br />
* h5<br />
* h6<br />
* hr<br />
* i<br />
* img<br />
* input<br />
* ins<br />
* kbd<br />
* label<br />
* legend<br />
* li<br />
* map<br />
* menu<br />
* ol<br />
* optgroup<br />
* option<br />
* p<br />
* pre<br />
* q<br />
* s<br />
* samp<br />
* select<br />
* small<br />
* span<br />
* strike<br />
* strong<br />
* sub<br />
* sup<br />
* table<br />
* tbody<br />
* td<br />
* textarea<br />
* tfoot<br />
* th<br />
* thead<br />
* tr<br />
* tt<br />
* u<br />
* ul<br />
* var<br />
<br />
==== mathml Elements ====<br />
<br />
* maction<br />
* math<br />
* merror<br />
* mfrac<br />
* mi<br />
* mmultiscripts<br />
* mn<br />
* mo<br />
* mover<br />
* mpadded<br />
* mphantom<br />
* mprescripts<br />
* mroot<br />
* mrow<br />
* mspace<br />
* msqrt<br />
* mstyle<br />
* msub<br />
* msubsup<br />
* msup<br />
* mtable<br />
* mtd<br />
* mtext<br />
* mtr<br />
* munder<br />
* munderover<br />
* none<br />
<br />
==== svg Elements ====<br />
<br />
* a<br />
* animate<br />
* animateColor<br />
* animateMotion<br />
* animateTransform<br />
* circle<br />
* defs<br />
* desc<br />
* ellipse<br />
* font-face<br />
* font-face-name<br />
* font-face-src<br />
* g<br />
* glyph<br />
* hkern<br />
* image<br />
* linearGradient<br />
* line<br />
* marker<br />
* metadata<br />
* missing-glyph<br />
* mpath<br />
* path<br />
* polygon<br />
* polyline<br />
* radialGradient<br />
* rect<br />
* set<br />
* stop<br />
* svg<br />
* switch<br />
* text<br />
* title<br />
* tspan<br />
* use<br />
<br />
=== Acceptable Attributes ===<br />
<br />
* abbr<br />
* accept<br />
* accept-charset<br />
* accesskey<br />
* action<br />
* align<br />
* alt<br />
* axis<br />
* border<br />
* cellpadding<br />
* cellspacing<br />
* char<br />
* charoff<br />
* charset<br />
* checked<br />
* cite<br />
* class<br />
* clear<br />
* cols<br />
* colspan<br />
* color<br />
* compact<br />
* coords<br />
* datetime<br />
* dir<br />
* disabled<br />
* enctype<br />
* for<br />
* frame<br />
* headers<br />
* height<br />
* href<br />
* hreflang<br />
* hspace<br />
* id<br />
* ismap<br />
* label<br />
* lang<br />
* longdesc<br />
* maxlength<br />
* media<br />
* method<br />
* multiple<br />
* name<br />
* nohref<br />
* noshade<br />
* nowrap<br />
* prompt<br />
* readonly<br />
* rel<br />
* rev<br />
* rows<br />
* rowspan<br />
* rules<br />
* scope<br />
* selected<br />
* shape<br />
* size<br />
* span<br />
* src<br />
* start<br />
* style<br />
* summary<br />
* tabindex<br />
* target<br />
* title<br />
* type<br />
* usemap<br />
* valign<br />
* value<br />
* vspace<br />
* width<br />
* xml:lang<br />
<br />
==== mathml Attributes ====<br />
<br />
* actiontype<br />
* align<br />
* columnalign<br />
* columnalign<br />
* columnalign<br />
* columnlines<br />
* columnspacing<br />
* columnspan<br />
* depth<br />
* display<br />
* displaystyle<br />
* equalcolumns<br />
* equalrows<br />
* fence<br />
* fontstyle<br />
* fontweight<br />
* frame<br />
* height<br />
* linethickness<br />
* lspace<br />
* mathbackground<br />
* mathcolor<br />
* mathvariant<br />
* mathvariant<br />
* maxsize<br />
* minsize<br />
* other<br />
* rowalign<br />
* rowalign<br />
* rowalign<br />
* rowlines<br />
* rowspacing<br />
* rowspan<br />
* rspace<br />
* scriptlevel<br />
* selection<br />
* separator<br />
* stretchy<br />
* width<br />
* width<br />
* xlink:href<br />
* xlink:show<br />
* xlink:type<br />
* xmlns<br />
* xmlns:xlink<br />
<br />
==== svg Attributes ====<br />
<br />
* accent-height<br />
* accumulate<br />
* additive<br />
* alphabetic<br />
* arabic-form<br />
* ascent<br />
* attributeName<br />
* attributeType<br />
* baseProfile<br />
* bbox<br />
* begin<br />
* by<br />
* calcMode<br />
* cap-height<br />
* class<br />
* color<br />
* color-rendering<br />
* content<br />
* cx<br />
* cy<br />
* d<br />
* dx<br />
* dy<br />
* descent<br />
* display<br />
* dur<br />
* end<br />
* fill<br />
* fill-rule<br />
* font-family<br />
* font-size<br />
* font-stretch<br />
* font-style<br />
* font-variant<br />
* font-weight<br />
* from<br />
* fx<br />
* fy<br />
* g1<br />
* g2<br />
* glyph-name<br />
* gradientUnits<br />
* hanging<br />
* height<br />
* horiz-adv-x<br />
* horiz-origin-x<br />
* id<br />
* ideographic<br />
* k<br />
* keyPoints<br />
* keySplines<br />
* keyTimes<br />
* lang<br />
* marker-end<br />
* marker-mid<br />
* marker-start<br />
* markerHeight<br />
* markerUnits<br />
* markerWidth<br />
* mathematical<br />
* max<br />
* min<br />
* name<br />
* offset<br />
* opacity<br />
* orient<br />
* origin<br />
* overline-position<br />
* overline-thickness<br />
* panose-1<br />
* path<br />
* pathLength<br />
* points<br />
* preserveAspectRatio<br />
* r<br />
* refX<br />
* refY<br />
* repeatCount<br />
* repeatDur<br />
* requiredExtensions<br />
* requiredFeatures<br />
* restart<br />
* rotate<br />
* rx<br />
* ry<br />
* slope<br />
* stemh<br />
* stemv<br />
* stop-color<br />
* stop-opacity<br />
* strikethrough-position<br />
* strikethrough-thickness<br />
* stroke<br />
* stroke-dasharray<br />
* stroke-dashoffset<br />
* stroke-linecap<br />
* stroke-linejoin<br />
* stroke-miterlimit<br />
* stroke-opacity<br />
* stroke-width<br />
* systemLanguage<br />
* target<br />
* text-anchor<br />
* to<br />
* transform<br />
* type<br />
* u1<br />
* u2<br />
* underline-position<br />
* underline-thickness<br />
* unicode<br />
* unicode-range<br />
* units-per-em<br />
* values<br />
* version<br />
* viewBox<br />
* visibility<br />
* width<br />
* widths<br />
* x<br />
* x-height<br />
* x1<br />
* x2<br />
* xlink:actuate<br />
* xlink:arcrole<br />
* xlink:href<br />
* xlink:role<br />
* xlink:show<br />
* xlink:title<br />
* xlink:type<br />
* xml:base<br />
* xml:lang<br />
* xml:space<br />
* xmlns<br />
* xmlns:xlink<br />
* y<br />
* y1<br />
* y2<br />
* zoomAndPan<br />
<br />
=== CSS Rules ===<br />
<br />
First <code>urls</code> matching the following regular expression are removed:<br />
<pre>url\s*\(\s*[^\s)]+?\s*\)\s*</pre><br />
<br />
The style strings that don't match the following are deemed obfuscated, and ignored entirely:<br />
<pre>^([:,;#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'|"[\s\w]+"|\([\d,\s]+\))*$</pre><br />
<pre>^(\s*[-\w]+\s*:\s*[^:;]*(;|$))*$</pre><br />
<br />
==== style Properties ====<br />
<br />
* azimuth<br />
* background, background-*<br />
* border, border-*<br />
* clear<br />
* color<br />
* cursor<br />
* direction<br />
* display<br />
* elevation<br />
* float<br />
* font<br />
* font-family<br />
* font-size<br />
* font-style<br />
* font-variant<br />
* font-weight<br />
* height<br />
* letter-spacing<br />
* line-height<br />
* margin, margin-*<br />
* overflow<br />
* padding, padding-*<br />
* pause<br />
* pause-after<br />
* pause-before<br />
* pitch<br />
* pitch-range<br />
* richness<br />
* speak<br />
* speak-header<br />
* speak-numeral<br />
* speak-punctuation<br />
* speech-rate<br />
* stress<br />
* text-align<br />
* text-decoration<br />
* text-indent<br />
* unicode-bidi<br />
* vertical-align<br />
* voice-family<br />
* volume<br />
* white-space<br />
* width<br />
<br />
==== style Property Values ====<br />
<br />
* auto<br />
* aqua<br />
* black<br />
* block<br />
* blue<br />
* bold<br />
* both<br />
* bottom<br />
* brown<br />
* center<br />
* collapse<br />
* dashed<br />
* dotted<br />
* fuchsia<br />
* gray<br />
* green<br />
* !important<br />
* italic<br />
* left<br />
* lime<br />
* maroon<br />
* medium<br />
* none<br />
* navy<br />
* normal<br />
* nowrap<br />
* olive<br />
* pointer<br />
* purple<br />
* red<br />
* right<br />
* solid<br />
* silver<br />
* teal<br />
* top<br />
* transparent<br />
* underline<br />
* white<br />
* yellow<br />
<br />
In addition, values that match the following regular expression are valid:<br />
<br />
<code>^(#[0-9a-f]+|rgb\(\d+%?,\d*%?,?\d*%?\)?|\d{0,2}\.?\d{0,2}(cm|em|ex|in|mm|pc|pt|px|%|,|\))?)$</code><br />
<br />
==== svg sytle Properties ====<br />
<br />
* fill<br />
* fill-opacity<br />
* fill-rule<br />
* stroke<br />
* stroke-width<br />
* stroke-linecap<br />
* stroke-linejoin<br />
* stroke-opacity<br />
<br />
=== URIs ===<br />
==== Attributes whose value is a URI ====<br />
<br />
* href<br />
* src<br />
* cite<br />
* action<br />
* longdesc<br />
* xlink:href<br />
* xml:base<br />
<br />
==== URI protocols ====<br />
<br />
* afs<br />
* aim<br />
* callto<br />
* data (see [[#Safe data URL content types]])<br />
* ed2k<br />
* feed<br />
* ftp<br />
* gopher<br />
* http<br />
* https<br />
* irc<br />
* mailto<br />
* news<br />
* nntp<br />
* rsync<br />
* rtsp<br />
* sftp<br />
* ssh<br />
* tag<br />
* tel<br />
* telnet<br />
* urn<br />
* webcal<br />
* wtai<br />
* xmpp<br />
<br />
==== Safe data URL content types ====<br />
Note: This section is being [http://wiki.whatwg.org/wiki/Talk:Sanitization_rules discussed].<br />
* text/plain<br />
* image/gif<br />
* image/jpg<br />
* image/png</div>Rubyshttps://wiki.whatwg.org/index.php?title=Talk:Sanitization_rules&diff=2404Talk:Sanitization rules2007-08-10T10:30:22Z<p>Rubys: </p>
<hr />
<div>Is the data URI scheme safe?<br />
<br />
* Rob Sayre says no and refers to a wikipedia article; however, I cannot see anything in the [http://en.wikipedia.org/wiki/Data:_URI_scheme article] that indicates the scheme is not safe.<br />
** Looking at that wikipedia page, <code>data</code> could only be added if it were followed by an asterisk, kinda like the 756* that I see popping up all over the place these days. In particular, I don't see the use case which would justify the investment in sanitizing <code>text/html</code> encoded as a data URI. Not that it would be difficult, just hard to justify. Perhaps a section could be added which lists safe content types when included in data URIs. -- [[User:Rubys|Rubys]] 03:48, 9 August 2007 (UTC)<br />
* Data URIs should be santizable on a per-MIME type basis. Until a vulnerability is found for text/plain mime types data URIs should be allowed, but other MIME types should be not allowed by default. Other, safer types could then be allowed via white list. -- [[User:Enricopulatzo|Enricopulatzo]] 16:49, 9 August 2007 (UTC)<br />
** The word "default" puzzles me here. The common use case here is small GIFs, JPEGs, and PNGs to be directly embedded in places like CSS and <img> tags. If the associated MIME-types were to be white listed, under what condition would they '''not''' be allowed through? -- [[User:Rubys|Rubys]] 10:30, 10 August 2007 (UTC)</div>Rubyshttps://wiki.whatwg.org/index.php?title=Sanitization_rules&diff=2403Sanitization rules2007-08-10T02:18:15Z<p>Rubys: /* style Property Values */</p>
<hr />
<div>This page was initially seeded with the sanitization lists and rules implemented by the [http://code.google.com/p/html5lib/ html5lib] sanitizer, which in turn was based on [http://golem.ph.utexas.edu/instiki/show/HomePage Jacques Distler's branch of Instiki], which in turn was based on the sanitization logic in the [http://www.feedparser.org/ Universal Feed Parser].<br />
<br />
It is hoped that others will add, update, and extend this list based on their experiences in their own products, and furthermore that some will update their products based on these lists.<br />
<br />
As a suggestion but not as a requirement: people who do update their products to reflect information from this list are encouraged to add a link to this page as a comment in the hopes that it will encourage subsequent maintainers to keep this page up to date.<br />
<br />
=== Acceptable Elements ===<br />
<br />
* a<br />
* abbr<br />
* acronym<br />
* address<br />
* area<br />
* b<br />
* big<br />
* blockquote<br />
* br<br />
* button<br />
* caption<br />
* center<br />
* cite<br />
* code<br />
* col<br />
* colgroup<br />
* dd<br />
* del<br />
* dfn<br />
* dir<br />
* div<br />
* dl<br />
* dt<br />
* em<br />
* fieldset<br />
* font<br />
* form<br />
* h1<br />
* h2<br />
* h3<br />
* h4<br />
* h5<br />
* h6<br />
* hr<br />
* i<br />
* img<br />
* input<br />
* ins<br />
* kbd<br />
* label<br />
* legend<br />
* li<br />
* map<br />
* menu<br />
* ol<br />
* optgroup<br />
* option<br />
* p<br />
* pre<br />
* q<br />
* s<br />
* samp<br />
* select<br />
* small<br />
* span<br />
* strike<br />
* strong<br />
* sub<br />
* sup<br />
* table<br />
* tbody<br />
* td<br />
* textarea<br />
* tfoot<br />
* th<br />
* thead<br />
* tr<br />
* tt<br />
* u<br />
* ul<br />
* var<br />
<br />
==== mathml Elements ====<br />
<br />
* maction<br />
* math<br />
* merror<br />
* mfrac<br />
* mi<br />
* mmultiscripts<br />
* mn<br />
* mo<br />
* mover<br />
* mpadded<br />
* mphantom<br />
* mprescripts<br />
* mroot<br />
* mrow<br />
* mspace<br />
* msqrt<br />
* mstyle<br />
* msub<br />
* msubsup<br />
* msup<br />
* mtable<br />
* mtd<br />
* mtext<br />
* mtr<br />
* munder<br />
* munderover<br />
* none<br />
<br />
==== svg Elements ====<br />
<br />
* a<br />
* animate<br />
* animateColor<br />
* animateMotion<br />
* animateTransform<br />
* circle<br />
* defs<br />
* desc<br />
* ellipse<br />
* font-face<br />
* font-face-name<br />
* font-face-src<br />
* g<br />
* glyph<br />
* hkern<br />
* image<br />
* linearGradient<br />
* line<br />
* marker<br />
* metadata<br />
* missing-glyph<br />
* mpath<br />
* path<br />
* polygon<br />
* polyline<br />
* radialGradient<br />
* rect<br />
* set<br />
* stop<br />
* svg<br />
* switch<br />
* text<br />
* title<br />
* tspan<br />
* use<br />
<br />
=== Acceptable Attributes ===<br />
<br />
* abbr<br />
* accept<br />
* accept-charset<br />
* accesskey<br />
* action<br />
* align<br />
* alt<br />
* axis<br />
* border<br />
* cellpadding<br />
* cellspacing<br />
* char<br />
* charoff<br />
* charset<br />
* checked<br />
* cite<br />
* class<br />
* clear<br />
* cols<br />
* colspan<br />
* color<br />
* compact<br />
* coords<br />
* datetime<br />
* dir<br />
* disabled<br />
* enctype<br />
* for<br />
* frame<br />
* headers<br />
* height<br />
* href<br />
* hreflang<br />
* hspace<br />
* id<br />
* ismap<br />
* label<br />
* lang<br />
* longdesc<br />
* maxlength<br />
* media<br />
* method<br />
* multiple<br />
* name<br />
* nohref<br />
* noshade<br />
* nowrap<br />
* prompt<br />
* readonly<br />
* rel<br />
* rev<br />
* rows<br />
* rowspan<br />
* rules<br />
* scope<br />
* selected<br />
* shape<br />
* size<br />
* span<br />
* src<br />
* start<br />
* style<br />
* summary<br />
* tabindex<br />
* target<br />
* title<br />
* type<br />
* usemap<br />
* valign<br />
* value<br />
* vspace<br />
* width<br />
* xml:lang<br />
<br />
==== mathml Attributes ====<br />
<br />
* actiontype<br />
* align<br />
* columnalign<br />
* columnalign<br />
* columnalign<br />
* columnlines<br />
* columnspacing<br />
* columnspan<br />
* depth<br />
* display<br />
* displaystyle<br />
* equalcolumns<br />
* equalrows<br />
* fence<br />
* fontstyle<br />
* fontweight<br />
* frame<br />
* height<br />
* linethickness<br />
* lspace<br />
* mathbackground<br />
* mathcolor<br />
* mathvariant<br />
* mathvariant<br />
* maxsize<br />
* minsize<br />
* other<br />
* rowalign<br />
* rowalign<br />
* rowalign<br />
* rowlines<br />
* rowspacing<br />
* rowspan<br />
* rspace<br />
* scriptlevel<br />
* selection<br />
* separator<br />
* stretchy<br />
* width<br />
* width<br />
* xlink:href<br />
* xlink:show<br />
* xlink:type<br />
* xmlns<br />
* xmlns:xlink<br />
<br />
==== svg Attributes ====<br />
<br />
* accent-height<br />
* accumulate<br />
* additive<br />
* alphabetic<br />
* arabic-form<br />
* ascent<br />
* attributeName<br />
* attributeType<br />
* baseProfile<br />
* bbox<br />
* begin<br />
* by<br />
* calcMode<br />
* cap-height<br />
* class<br />
* color<br />
* color-rendering<br />
* content<br />
* cx<br />
* cy<br />
* d<br />
* dx<br />
* dy<br />
* descent<br />
* display<br />
* dur<br />
* end<br />
* fill<br />
* fill-rule<br />
* font-family<br />
* font-size<br />
* font-stretch<br />
* font-style<br />
* font-variant<br />
* font-weight<br />
* from<br />
* fx<br />
* fy<br />
* g1<br />
* g2<br />
* glyph-name<br />
* gradientUnits<br />
* hanging<br />
* height<br />
* horiz-adv-x<br />
* horiz-origin-x<br />
* id<br />
* ideographic<br />
* k<br />
* keyPoints<br />
* keySplines<br />
* keyTimes<br />
* lang<br />
* marker-end<br />
* marker-mid<br />
* marker-start<br />
* markerHeight<br />
* markerUnits<br />
* markerWidth<br />
* mathematical<br />
* max<br />
* min<br />
* name<br />
* offset<br />
* opacity<br />
* orient<br />
* origin<br />
* overline-position<br />
* overline-thickness<br />
* panose-1<br />
* path<br />
* pathLength<br />
* points<br />
* preserveAspectRatio<br />
* r<br />
* refX<br />
* refY<br />
* repeatCount<br />
* repeatDur<br />
* requiredExtensions<br />
* requiredFeatures<br />
* restart<br />
* rotate<br />
* rx<br />
* ry<br />
* slope<br />
* stemh<br />
* stemv<br />
* stop-color<br />
* stop-opacity<br />
* strikethrough-position<br />
* strikethrough-thickness<br />
* stroke<br />
* stroke-dasharray<br />
* stroke-dashoffset<br />
* stroke-linecap<br />
* stroke-linejoin<br />
* stroke-miterlimit<br />
* stroke-opacity<br />
* stroke-width<br />
* systemLanguage<br />
* target<br />
* text-anchor<br />
* to<br />
* transform<br />
* type<br />
* u1<br />
* u2<br />
* underline-position<br />
* underline-thickness<br />
* unicode<br />
* unicode-range<br />
* units-per-em<br />
* values<br />
* version<br />
* viewBox<br />
* visibility<br />
* width<br />
* widths<br />
* x<br />
* x-height<br />
* x1<br />
* x2<br />
* xlink:actuate<br />
* xlink:arcrole<br />
* xlink:href<br />
* xlink:role<br />
* xlink:show<br />
* xlink:title<br />
* xlink:type<br />
* xml:base<br />
* xml:lang<br />
* xml:space<br />
* xmlns<br />
* xmlns:xlink<br />
* y<br />
* y1<br />
* y2<br />
* zoomAndPan<br />
<br />
=== CSS Rules ===<br />
<br />
First <code>urls</code> matching the following regular expression are removed:<br />
<pre>url\s*\(\s*[^\s)]+?\s*\)\s*</pre><br />
<br />
The style strings that don't match the following are deemed obfuscated, and ignored entirely:<br />
<pre>^([:,;#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'|"[\s\w]+"|\([\d,\s]+\))*$</pre><br />
<pre>^(\s*[-\w]+\s*:\s*[^:;]*(;|$))*$</pre><br />
<br />
==== style Properties ====<br />
<br />
* azimuth<br />
* background-color<br />
* border-bottom-color<br />
* border-collapse<br />
* border-color<br />
* border-left-color<br />
* border-right-color<br />
* border-top-color<br />
* clear<br />
* color<br />
* cursor<br />
* direction<br />
* display<br />
* elevation<br />
* float<br />
* font<br />
* font-family<br />
* font-size<br />
* font-style<br />
* font-variant<br />
* font-weight<br />
* height<br />
* letter-spacing<br />
* line-height<br />
* overflow<br />
* pause<br />
* pause-after<br />
* pause-before<br />
* pitch<br />
* pitch-range<br />
* richness<br />
* speak<br />
* speak-header<br />
* speak-numeral<br />
* speak-punctuation<br />
* speech-rate<br />
* stress<br />
* text-align<br />
* text-decoration<br />
* text-indent<br />
* unicode-bidi<br />
* vertical-align<br />
* voice-family<br />
* volume<br />
* white-space<br />
* width<br />
<br />
==== style Property Values ====<br />
<br />
* auto<br />
* aqua<br />
* black<br />
* block<br />
* blue<br />
* bold<br />
* both<br />
* bottom<br />
* brown<br />
* center<br />
* collapse<br />
* dashed<br />
* dotted<br />
* fuchsia<br />
* gray<br />
* green<br />
* !important<br />
* italic<br />
* left<br />
* lime<br />
* maroon<br />
* medium<br />
* none<br />
* navy<br />
* normal<br />
* nowrap<br />
* olive<br />
* pointer<br />
* purple<br />
* red<br />
* right<br />
* solid<br />
* silver<br />
* teal<br />
* top<br />
* transparent<br />
* underline<br />
* white<br />
* yellow<br />
<br />
In addition, values that match the following regular expression are valid:<br />
<br />
<code>^(#[0-9a-f]+|rgb\(\d+%?,\d*%?,?\d*%?\)?|\d{0,2}\.?\d{0,2}(cm|em|ex|in|mm|pc|pt|px|%|,|\))?)$</code><br />
<br />
==== svg sytle Properties ====<br />
<br />
* fill<br />
* fill-opacity<br />
* fill-rule<br />
* stroke<br />
* stroke-width<br />
* stroke-linecap<br />
* stroke-linejoin<br />
* stroke-opacity<br />
<br />
=== URIs ===<br />
==== Attributes whose value is a URI ====<br />
<br />
* href<br />
* src<br />
* cite<br />
* action<br />
* longdesc<br />
* xlink:href<br />
* xml:base<br />
<br />
==== URI protocols ====<br />
<br />
* afs<br />
* aim<br />
* callto<br />
* ed2k<br />
* feed<br />
* ftp<br />
* gopher<br />
* http<br />
* https<br />
* irc<br />
* mailto<br />
* news<br />
* nntp<br />
* rsync<br />
* rtsp<br />
* sftp<br />
* ssh<br />
* tag<br />
* tel<br />
* telnet<br />
* urn<br />
* webcal<br />
* wtai<br />
* xmpp</div>Rubyshttps://wiki.whatwg.org/index.php?title=Talk:Sanitization_rules&diff=2399Talk:Sanitization rules2007-08-09T03:48:53Z<p>Rubys: suggest that data URIs require further sanitization</p>
<hr />
<div>Is the data URI scheme safe?<br />
<br />
* Rob Sayre says no and refers to a wikipedia article; however, I cannot see anything in the [http://en.wikipedia.org/wiki/Data:_URI_scheme article] that indicates the scheme is not safe.<br />
** Looking at that wikipedia page, <code>data</code> could only be added if it were followed by an asterisk, kinda like the 756* that I see popping up all over the place these days. In particular, I don't see the use case which would justify the investment in sanitizing <code>text/html</code> encoded as a data URI. Not that it would be difficult, just hard to justify. Perhaps a section could be added which lists safe content types when included in data URIs. -- [[User:Rubys|Rubys]] 03:48, 9 August 2007 (UTC)</div>Rubyshttps://wiki.whatwg.org/index.php?title=Sanitization_rules&diff=2393Sanitization rules2007-08-07T12:30:29Z<p>Rubys: New page: This page was initially seeded with the sanitization lists and rules implemented by the [http://code.google.com/p/html5lib/ html5lib] sanitizer, which in turn was based on [http://golem.ph...</p>
<hr />
<div>This page was initially seeded with the sanitization lists and rules implemented by the [http://code.google.com/p/html5lib/ html5lib] sanitizer, which in turn was based on [http://golem.ph.utexas.edu/instiki/show/HomePage Jacques Distler's branch of Instiki], which in turn was based on the sanitization logic in the [http://www.feedparser.org/ Universal Feed Parser].<br />
<br />
It is hoped that others will add, update, and extend this list based on their experiences in their own products, and furthermore that some will update their products based on these lists.<br />
<br />
As a suggestion but not as a requirement: people who do update their products to reflect information from this list are encouraged to add a link to this page as a comment in the hopes that it will encourage subsequent maintainers to keep this page up to date.<br />
<br />
=== Acceptable Elements ===<br />
<br />
* a<br />
* abbr<br />
* acronym<br />
* address<br />
* area<br />
* b<br />
* big<br />
* blockquote<br />
* br<br />
* button<br />
* caption<br />
* center<br />
* cite<br />
* code<br />
* col<br />
* colgroup<br />
* dd<br />
* del<br />
* dfn<br />
* dir<br />
* div<br />
* dl<br />
* dt<br />
* em<br />
* fieldset<br />
* font<br />
* form<br />
* h1<br />
* h2<br />
* h3<br />
* h4<br />
* h5<br />
* h6<br />
* hr<br />
* i<br />
* img<br />
* input<br />
* ins<br />
* kbd<br />
* label<br />
* legend<br />
* li<br />
* map<br />
* menu<br />
* ol<br />
* optgroup<br />
* option<br />
* p<br />
* pre<br />
* q<br />
* s<br />
* samp<br />
* select<br />
* small<br />
* span<br />
* strike<br />
* strong<br />
* sub<br />
* sup<br />
* table<br />
* tbody<br />
* td<br />
* textarea<br />
* tfoot<br />
* th<br />
* thead<br />
* tr<br />
* tt<br />
* u<br />
* ul<br />
* var<br />
<br />
==== mathml Elements ====<br />
<br />
* maction<br />
* math<br />
* merror<br />
* mfrac<br />
* mi<br />
* mmultiscripts<br />
* mn<br />
* mo<br />
* mover<br />
* mpadded<br />
* mphantom<br />
* mprescripts<br />
* mroot<br />
* mrow<br />
* mspace<br />
* msqrt<br />
* mstyle<br />
* msub<br />
* msubsup<br />
* msup<br />
* mtable<br />
* mtd<br />
* mtext<br />
* mtr<br />
* munder<br />
* munderover<br />
* none<br />
<br />
==== svg Elements ====<br />
<br />
* a<br />
* animate<br />
* animateColor<br />
* animateMotion<br />
* animateTransform<br />
* circle<br />
* defs<br />
* desc<br />
* ellipse<br />
* font-face<br />
* font-face-name<br />
* font-face-src<br />
* g<br />
* glyph<br />
* hkern<br />
* image<br />
* linearGradient<br />
* line<br />
* marker<br />
* metadata<br />
* missing-glyph<br />
* mpath<br />
* path<br />
* polygon<br />
* polyline<br />
* radialGradient<br />
* rect<br />
* set<br />
* stop<br />
* svg<br />
* switch<br />
* text<br />
* title<br />
* tspan<br />
* use<br />
<br />
=== Acceptable Attributes ===<br />
<br />
* abbr<br />
* accept<br />
* accept-charset<br />
* accesskey<br />
* action<br />
* align<br />
* alt<br />
* axis<br />
* border<br />
* cellpadding<br />
* cellspacing<br />
* char<br />
* charoff<br />
* charset<br />
* checked<br />
* cite<br />
* class<br />
* clear<br />
* cols<br />
* colspan<br />
* color<br />
* compact<br />
* coords<br />
* datetime<br />
* dir<br />
* disabled<br />
* enctype<br />
* for<br />
* frame<br />
* headers<br />
* height<br />
* href<br />
* hreflang<br />
* hspace<br />
* id<br />
* ismap<br />
* label<br />
* lang<br />
* longdesc<br />
* maxlength<br />
* media<br />
* method<br />
* multiple<br />
* name<br />
* nohref<br />
* noshade<br />
* nowrap<br />
* prompt<br />
* readonly<br />
* rel<br />
* rev<br />
* rows<br />
* rowspan<br />
* rules<br />
* scope<br />
* selected<br />
* shape<br />
* size<br />
* span<br />
* src<br />
* start<br />
* style<br />
* summary<br />
* tabindex<br />
* target<br />
* title<br />
* type<br />
* usemap<br />
* valign<br />
* value<br />
* vspace<br />
* width<br />
* xml:lang<br />
<br />
==== mathml Attributes ====<br />
<br />
* actiontype<br />
* align<br />
* columnalign<br />
* columnalign<br />
* columnalign<br />
* columnlines<br />
* columnspacing<br />
* columnspan<br />
* depth<br />
* display<br />
* displaystyle<br />
* equalcolumns<br />
* equalrows<br />
* fence<br />
* fontstyle<br />
* fontweight<br />
* frame<br />
* height<br />
* linethickness<br />
* lspace<br />
* mathbackground<br />
* mathcolor<br />
* mathvariant<br />
* mathvariant<br />
* maxsize<br />
* minsize<br />
* other<br />
* rowalign<br />
* rowalign<br />
* rowalign<br />
* rowlines<br />
* rowspacing<br />
* rowspan<br />
* rspace<br />
* scriptlevel<br />
* selection<br />
* separator<br />
* stretchy<br />
* width<br />
* width<br />
* xlink:href<br />
* xlink:show<br />
* xlink:type<br />
* xmlns<br />
* xmlns:xlink<br />
<br />
==== svg Attributes ====<br />
<br />
* accent-height<br />
* accumulate<br />
* additive<br />
* alphabetic<br />
* arabic-form<br />
* ascent<br />
* attributeName<br />
* attributeType<br />
* baseProfile<br />
* bbox<br />
* begin<br />
* by<br />
* calcMode<br />
* cap-height<br />
* class<br />
* color<br />
* color-rendering<br />
* content<br />
* cx<br />
* cy<br />
* d<br />
* dx<br />
* dy<br />
* descent<br />
* display<br />
* dur<br />
* end<br />
* fill<br />
* fill-rule<br />
* font-family<br />
* font-size<br />
* font-stretch<br />
* font-style<br />
* font-variant<br />
* font-weight<br />
* from<br />
* fx<br />
* fy<br />
* g1<br />
* g2<br />
* glyph-name<br />
* gradientUnits<br />
* hanging<br />
* height<br />
* horiz-adv-x<br />
* horiz-origin-x<br />
* id<br />
* ideographic<br />
* k<br />
* keyPoints<br />
* keySplines<br />
* keyTimes<br />
* lang<br />
* marker-end<br />
* marker-mid<br />
* marker-start<br />
* markerHeight<br />
* markerUnits<br />
* markerWidth<br />
* mathematical<br />
* max<br />
* min<br />
* name<br />
* offset<br />
* opacity<br />
* orient<br />
* origin<br />
* overline-position<br />
* overline-thickness<br />
* panose-1<br />
* path<br />
* pathLength<br />
* points<br />
* preserveAspectRatio<br />
* r<br />
* refX<br />
* refY<br />
* repeatCount<br />
* repeatDur<br />
* requiredExtensions<br />
* requiredFeatures<br />
* restart<br />
* rotate<br />
* rx<br />
* ry<br />
* slope<br />
* stemh<br />
* stemv<br />
* stop-color<br />
* stop-opacity<br />
* strikethrough-position<br />
* strikethrough-thickness<br />
* stroke<br />
* stroke-dasharray<br />
* stroke-dashoffset<br />
* stroke-linecap<br />
* stroke-linejoin<br />
* stroke-miterlimit<br />
* stroke-opacity<br />
* stroke-width<br />
* systemLanguage<br />
* target<br />
* text-anchor<br />
* to<br />
* transform<br />
* type<br />
* u1<br />
* u2<br />
* underline-position<br />
* underline-thickness<br />
* unicode<br />
* unicode-range<br />
* units-per-em<br />
* values<br />
* version<br />
* viewBox<br />
* visibility<br />
* width<br />
* widths<br />
* x<br />
* x-height<br />
* x1<br />
* x2<br />
* xlink:actuate<br />
* xlink:arcrole<br />
* xlink:href<br />
* xlink:role<br />
* xlink:show<br />
* xlink:title<br />
* xlink:type<br />
* xml:base<br />
* xml:lang<br />
* xml:space<br />
* xmlns<br />
* xmlns:xlink<br />
* y<br />
* y1<br />
* y2<br />
* zoomAndPan<br />
<br />
=== CSS Rules ===<br />
<br />
First <code>urls</code> matching the following regular expression are removed:<br />
<pre>url\s*\(\s*[^\s)]+?\s*\)\s*</pre><br />
<br />
The style strings that don't match the following are deemed obfuscated, and ignored entirely:<br />
<pre>^([:,;#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'|"[\s\w]+"|\([\d,\s]+\))*$</pre><br />
<pre>^(\s*[-\w]+\s*:\s*[^:;]*(;|$))*$</pre><br />
<br />
==== style Properties ====<br />
<br />
* azimuth<br />
* background-color<br />
* border-bottom-color<br />
* border-collapse<br />
* border-color<br />
* border-left-color<br />
* border-right-color<br />
* border-top-color<br />
* clear<br />
* color<br />
* cursor<br />
* direction<br />
* display<br />
* elevation<br />
* float<br />
* font<br />
* font-family<br />
* font-size<br />
* font-style<br />
* font-variant<br />
* font-weight<br />
* height<br />
* letter-spacing<br />
* line-height<br />
* overflow<br />
* pause<br />
* pause-after<br />
* pause-before<br />
* pitch<br />
* pitch-range<br />
* richness<br />
* speak<br />
* speak-header<br />
* speak-numeral<br />
* speak-punctuation<br />
* speech-rate<br />
* stress<br />
* text-align<br />
* text-decoration<br />
* text-indent<br />
* unicode-bidi<br />
* vertical-align<br />
* voice-family<br />
* volume<br />
* white-space<br />
* width<br />
<br />
==== style Property Values ====<br />
<br />
* auto<br />
* aqua<br />
* black<br />
* block<br />
* blue<br />
* bold<br />
* both<br />
* bottom<br />
* brown<br />
* center<br />
* collapse<br />
* dashed<br />
* dotted<br />
* fuchsia<br />
* gray<br />
* green<br />
* !important<br />
* italic<br />
* left<br />
* lime<br />
* maroon<br />
* medium<br />
* none<br />
* navy<br />
* normal<br />
* nowrap<br />
* olive<br />
* pointer<br />
* purple<br />
* red<br />
* right<br />
* solid<br />
* silver<br />
* teal<br />
* top<br />
* transparent<br />
* underline<br />
* white<br />
* yellow<br />
<br />
==== svg sytle Properties ====<br />
<br />
* fill<br />
* fill-opacity<br />
* fill-rule<br />
* stroke<br />
* stroke-width<br />
* stroke-linecap<br />
* stroke-linejoin<br />
* stroke-opacity<br />
<br />
=== URIs ===<br />
==== Attributes whose value is a URI ====<br />
<br />
* href<br />
* src<br />
* cite<br />
* action<br />
* longdesc<br />
* xlink:href<br />
* xml:base<br />
<br />
==== URI protocols ====<br />
<br />
* afs<br />
* aim<br />
* callto<br />
* ed2k<br />
* feed<br />
* ftp<br />
* gopher<br />
* http<br />
* https<br />
* irc<br />
* mailto<br />
* news<br />
* nntp<br />
* rsync<br />
* rtsp<br />
* sftp<br />
* ssh<br />
* tag<br />
* telnet<br />
* urn<br />
* webcal<br />
* xmpp</div>Rubyshttps://wiki.whatwg.org/index.php?title=HTML5Lib&diff=2351HTML5Lib2007-06-17T16:25:16Z<p>Rubys: /* HTML5Lib */</p>
<hr />
<div>= HTML5Lib =<br />
<br />
[http://code.google.com/p/html5lib/ HTML5Lib] is a project to create both a Python-based and Ruby-based implementations of various parts of the WHATWG spec, in particular, a tokenizer, a parser, and a serializer. It is '''not''' an offical WHATWG project, however we plan to use this wiki to document and discuss the library design. The code is avaliable under an open-source MIT license.<br />
<br />
== SVN ==<br />
Please commit often with sort of detailed descriptions of what you did. If you want to make sure you're not going to redo ask on the [http://groups.google.com/group/html5lib-discuss mailing list]. For questions that could benefit from quick turnaround, talk to people on #whatwg.<br />
<br />
== General ==<br />
<br />
In comments "XXX" indicates something that has yet to be done. Something might be wrong, has not yet been written and other things in that general direction.<br />
<br />
In comments "AT" indicates that the comment documents an alternate implementation technique or strategy.<br />
<br />
== HTMLTokenizer ==<br />
<br />
The tokenizer is controlled by a single HTMLTokenizer class stored in tokenizer.py at the moment. You initialize the HTMLTokenizer with a stream argument that holds an HTMLInputStream. You can iterate over the object created to get tokens back.<br />
<br />
Currently tokens are objects, they will become dicts.<br />
<br />
=== Interface ===<br />
<br />
The parser needs to change the self.contentModelFlag attribute which affects how certain states are handled.<br />
<br />
=== Issues ===<br />
* Use of if statements in the states may be suboptimal (but we should time this)<br />
<br />
== HTMLParser ==<br />
<br />
=== Profiling on web-apps.htm ===<br />
<br />
I did some profiling on web-apps.htm which is a rather large document. Based on that I already changed a number of things which speed us up a bit. Below are some things to consider for future revisions:<br />
<br />
* utils.MethodDispatcher is invoked way too often. By pre declaring some of it in InBody I managed to decrease the amount of invocatoins by over 24.000, but InBody.__init__ is invoked about 7000 times for web-apps.htm so that amount could be higher. Not sure how to put them somewhere else though. First thing I tried was HTMLParser but references get all messed up then...<br />
: We should be able to store a single instance of each InsertionMode rather than creating a new one every time the mode switches. Hopefully we have been disiplined enough not to keep any state in those classes so the change should be painless.<br />
:: That's an interesting idea. How would that work? [[User:Annevk|Annevk]] 12:49, 25 December 2006 (UTC)<br />
::: I got an idea on how it might work and it worked! Still about 3863 invocations to utils.MethodDispatcher but it takes 0.000 CPU seconds. I suppose we can decrease that amount even more, but I wonder if it's worth it. [[User:Annevk|Annevk]] 11:37, 26 December 2006 (UTC)<br />
<br />
* 713194 calls to __contains__ in sets.py makes us slow. Takes about 1.0x CPU seconds. <br />
: I've just switched to the built-in sets type. hopefully this will help a bit [[User:Jgraham|Jgraham]] 00:30, 25 December 2006 (UTC)<br />
:: It did. (Not surprisingly when 700.000 method calls are gone...) [[User:Annevk|Annevk]] 12:49, 25 December 2006 (UTC)<br />
<br />
* 440382 calls to char in tokenizer.py is the runner up with 0.8x CPU seconds.<br />
: This is now the largest time consumer. [[User:Annevk|Annevk]] 12:49, 25 December 2006 (UTC)<br />
<br />
* dataState in tokenizer.py with 0.7 CPU seconds is next.<br />
: This is now at 0.429 CPU seconds. Probably becase the tokenizer switched to dicts instead of custom Token objects. [[User:Annevk|Annevk]]<br />
<br />
* __iter__ in tokenizer.py with 0.59x CPU seconds...<br />
<br />
* Creation of all node objects in web-apps takes .57x CPU seconds.<br />
<br />
* etc.<br />
<br />
== Testcases ==<br />
Testcases are under the /tests directory. They require [http://cheeseshop.python.org/pypi/simplejson simplejson]. New code should not be checked in if it regresses previously functional unit tests. Similarly, new tests that don't pass should not be checked in without both informing others on the [http://groups.google.com/group/html5lib-discuss mailing list] and a concrete plan. Ideally new features should be accompanied by new unit tests for those features. Documentation of the test format is available at [[Parser_tests]].<br />
<br />
<br />
<br />
[[Category:Implementations]]</div>Rubyshttps://wiki.whatwg.org/index.php?title=HTML5Lib&diff=2350HTML5Lib2007-06-17T16:24:06Z<p>Rubys: /* SVN */</p>
<hr />
<div>= HTML5Lib =<br />
<br />
[http://code.google.com/p/html5lib/ HTML5Lib] is a project to create a Python-based implementation of various parts of the WHATWG spec, in particular, a tokenizer and parser. It is '''not''' an offical WHATWG project, however we plan to use this wiki to document and discuss the library design. The code is avaliable under an open-source MIT license.<br />
<br />
== SVN ==<br />
Please commit often with sort of detailed descriptions of what you did. If you want to make sure you're not going to redo ask on the [http://groups.google.com/group/html5lib-discuss mailing list]. For questions that could benefit from quick turnaround, talk to people on #whatwg.<br />
<br />
== General ==<br />
<br />
In comments "XXX" indicates something that has yet to be done. Something might be wrong, has not yet been written and other things in that general direction.<br />
<br />
In comments "AT" indicates that the comment documents an alternate implementation technique or strategy.<br />
<br />
== HTMLTokenizer ==<br />
<br />
The tokenizer is controlled by a single HTMLTokenizer class stored in tokenizer.py at the moment. You initialize the HTMLTokenizer with a stream argument that holds an HTMLInputStream. You can iterate over the object created to get tokens back.<br />
<br />
Currently tokens are objects, they will become dicts.<br />
<br />
=== Interface ===<br />
<br />
The parser needs to change the self.contentModelFlag attribute which affects how certain states are handled.<br />
<br />
=== Issues ===<br />
* Use of if statements in the states may be suboptimal (but we should time this)<br />
<br />
== HTMLParser ==<br />
<br />
=== Profiling on web-apps.htm ===<br />
<br />
I did some profiling on web-apps.htm which is a rather large document. Based on that I already changed a number of things which speed us up a bit. Below are some things to consider for future revisions:<br />
<br />
* utils.MethodDispatcher is invoked way too often. By pre declaring some of it in InBody I managed to decrease the amount of invocatoins by over 24.000, but InBody.__init__ is invoked about 7000 times for web-apps.htm so that amount could be higher. Not sure how to put them somewhere else though. First thing I tried was HTMLParser but references get all messed up then...<br />
: We should be able to store a single instance of each InsertionMode rather than creating a new one every time the mode switches. Hopefully we have been disiplined enough not to keep any state in those classes so the change should be painless.<br />
:: That's an interesting idea. How would that work? [[User:Annevk|Annevk]] 12:49, 25 December 2006 (UTC)<br />
::: I got an idea on how it might work and it worked! Still about 3863 invocations to utils.MethodDispatcher but it takes 0.000 CPU seconds. I suppose we can decrease that amount even more, but I wonder if it's worth it. [[User:Annevk|Annevk]] 11:37, 26 December 2006 (UTC)<br />
<br />
* 713194 calls to __contains__ in sets.py makes us slow. Takes about 1.0x CPU seconds. <br />
: I've just switched to the built-in sets type. hopefully this will help a bit [[User:Jgraham|Jgraham]] 00:30, 25 December 2006 (UTC)<br />
:: It did. (Not surprisingly when 700.000 method calls are gone...) [[User:Annevk|Annevk]] 12:49, 25 December 2006 (UTC)<br />
<br />
* 440382 calls to char in tokenizer.py is the runner up with 0.8x CPU seconds.<br />
: This is now the largest time consumer. [[User:Annevk|Annevk]] 12:49, 25 December 2006 (UTC)<br />
<br />
* dataState in tokenizer.py with 0.7 CPU seconds is next.<br />
: This is now at 0.429 CPU seconds. Probably becase the tokenizer switched to dicts instead of custom Token objects. [[User:Annevk|Annevk]]<br />
<br />
* __iter__ in tokenizer.py with 0.59x CPU seconds...<br />
<br />
* Creation of all node objects in web-apps takes .57x CPU seconds.<br />
<br />
* etc.<br />
<br />
== Testcases ==<br />
Testcases are under the /tests directory. They require [http://cheeseshop.python.org/pypi/simplejson simplejson]. New code should not be checked in if it regresses previously functional unit tests. Similarly, new tests that don't pass should not be checked in without both informing others on the [http://groups.google.com/group/html5lib-discuss mailing list] and a concrete plan. Ideally new features should be accompanied by new unit tests for those features. Documentation of the test format is available at [[Parser_tests]].<br />
<br />
<br />
<br />
[[Category:Implementations]]</div>Rubyshttps://wiki.whatwg.org/index.php?title=HTML5Lib&diff=2349HTML5Lib2007-06-17T16:22:38Z<p>Rubys: /* Testcases */</p>
<hr />
<div>= HTML5Lib =<br />
<br />
[http://code.google.com/p/html5lib/ HTML5Lib] is a project to create a Python-based implementation of various parts of the WHATWG spec, in particular, a tokenizer and parser. It is '''not''' an offical WHATWG project, however we plan to use this wiki to document and discuss the library design. The code is avaliable under an open-source MIT license.<br />
<br />
== SVN ==<br />
Please commit often with sort of detailed descriptions of what you did. If you want to make sure you're not going to redo work talk to people on #whatwg.<br />
<br />
== General ==<br />
<br />
In comments "XXX" indicates something that has yet to be done. Something might be wrong, has not yet been written and other things in that general direction.<br />
<br />
In comments "AT" indicates that the comment documents an alternate implementation technique or strategy.<br />
<br />
== HTMLTokenizer ==<br />
<br />
The tokenizer is controlled by a single HTMLTokenizer class stored in tokenizer.py at the moment. You initialize the HTMLTokenizer with a stream argument that holds an HTMLInputStream. You can iterate over the object created to get tokens back.<br />
<br />
Currently tokens are objects, they will become dicts.<br />
<br />
=== Interface ===<br />
<br />
The parser needs to change the self.contentModelFlag attribute which affects how certain states are handled.<br />
<br />
=== Issues ===<br />
* Use of if statements in the states may be suboptimal (but we should time this)<br />
<br />
== HTMLParser ==<br />
<br />
=== Profiling on web-apps.htm ===<br />
<br />
I did some profiling on web-apps.htm which is a rather large document. Based on that I already changed a number of things which speed us up a bit. Below are some things to consider for future revisions:<br />
<br />
* utils.MethodDispatcher is invoked way too often. By pre declaring some of it in InBody I managed to decrease the amount of invocatoins by over 24.000, but InBody.__init__ is invoked about 7000 times for web-apps.htm so that amount could be higher. Not sure how to put them somewhere else though. First thing I tried was HTMLParser but references get all messed up then...<br />
: We should be able to store a single instance of each InsertionMode rather than creating a new one every time the mode switches. Hopefully we have been disiplined enough not to keep any state in those classes so the change should be painless.<br />
:: That's an interesting idea. How would that work? [[User:Annevk|Annevk]] 12:49, 25 December 2006 (UTC)<br />
::: I got an idea on how it might work and it worked! Still about 3863 invocations to utils.MethodDispatcher but it takes 0.000 CPU seconds. I suppose we can decrease that amount even more, but I wonder if it's worth it. [[User:Annevk|Annevk]] 11:37, 26 December 2006 (UTC)<br />
<br />
* 713194 calls to __contains__ in sets.py makes us slow. Takes about 1.0x CPU seconds. <br />
: I've just switched to the built-in sets type. hopefully this will help a bit [[User:Jgraham|Jgraham]] 00:30, 25 December 2006 (UTC)<br />
:: It did. (Not surprisingly when 700.000 method calls are gone...) [[User:Annevk|Annevk]] 12:49, 25 December 2006 (UTC)<br />
<br />
* 440382 calls to char in tokenizer.py is the runner up with 0.8x CPU seconds.<br />
: This is now the largest time consumer. [[User:Annevk|Annevk]] 12:49, 25 December 2006 (UTC)<br />
<br />
* dataState in tokenizer.py with 0.7 CPU seconds is next.<br />
: This is now at 0.429 CPU seconds. Probably becase the tokenizer switched to dicts instead of custom Token objects. [[User:Annevk|Annevk]]<br />
<br />
* __iter__ in tokenizer.py with 0.59x CPU seconds...<br />
<br />
* Creation of all node objects in web-apps takes .57x CPU seconds.<br />
<br />
* etc.<br />
<br />
== Testcases ==<br />
Testcases are under the /tests directory. They require [http://cheeseshop.python.org/pypi/simplejson simplejson]. New code should not be checked in if it regresses previously functional unit tests. Similarly, new tests that don't pass should not be checked in without both informing others on the [http://groups.google.com/group/html5lib-discuss mailing list] and a concrete plan. Ideally new features should be accompanied by new unit tests for those features. Documentation of the test format is available at [[Parser_tests]].<br />
<br />
<br />
<br />
[[Category:Implementations]]</div>Rubyshttps://wiki.whatwg.org/index.php?title=Talk:HTML_vs._XHTML&diff=2006Talk:HTML vs. XHTML2006-12-04T20:02:27Z<p>Rubys: /* Mime Types */</p>
<hr />
<div>An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3. And that the proper way to tell the two apart is via MIME types.<br />
<br />
There are only two problems with that. XHTML is not as different from HTML as RDF/XML is from N3. And MIME types can't be relied on. Let's take each in turn.<br />
<br />
=== Syntax ===<br />
<br />
* Both N3 and RDF/XML are used to express sets of RDF triples. They are equally capable: every triple store can be dumped into either format. The analogy here is the DOM. It is not currently the case that every DOM tree can be dumped equally capably into either format.<br />
* N3 and RDF/XML are not the same, nor do they even look similar. They are different from top to bottom. Not only are no N3 documents valid RDF/XML, there are no individual triples that can be expressed the same way in both formats.<br />
** You need to explain how RDF/N3 is relevant! --[[User:Lachlan Hunt|Lachlan Hunt]] 04:43, 4 December 2006 (UTC)<br />
*** The top of this page starts with "An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3". Need I provide references? -[[User:Rubys|Rubys]] 14:08, 4 December 2006 (UTC)<br />
<br />
=== Mime Types ===<br />
<br />
* People have consistently proven that they can't be trusted to configure and set MIME types correctly. Most aren't even aware that MIME types exist. The default setup with Apache is to not allow overrides. One popular use case is for documentation that is served via <code>file:///</code> URIs directly from your hard disk.<br />
** <code>file:///</code> URIs use an OS or browser specific mechanism to determine the MIME. On Windows, for instance (for IE), the file extension is mapped to a MIME type via a key in the registry. --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
*** and as such, can rarely be depended upon. In addition to file extensions, content sniffing is also a common strategy. -[[User:Rubys|Rubys]] 14:12, 4 December 2006 (UTC)<br />
* HTTP as specified indicates that the the <code>Content-Type</code> header is authoritative - it trumps the XML prolog. HTTP as practiced treats the MIME type as a hint. Whether it be feeds or WMV files, users have an expectation as to what happens when they click on these links, and are unhappy when the browser lets them down.<br />
** For compatibility, those issues with several file formats do, unfortunately, have to be retained. However, breaking Content-Type in that way for <code>text/html</code> to somehow allow the content to be treated as XML instead is not an option. --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
<br />
It isn't clear to me (Hixie), however, how the fact that authors can't set the MIME type properly is supposed to be something we can ever solve from the point of view of the syntax of HTML. The full XML syntax isn't compatible with HTML parsers, and the full HTML syntax isn't compatible with XML parsers. The common subset is a tiny language that doesn't support widely used features like <style> or scripting. We can't parse text/html files as anything but HTML. The parser used for content sent with XML MIME types is out of scope for the WHATWG specs (it would be up to the XML guys). It isn't that we WANT the MIME type to be the only way to distinguish the two. It's that the MIME type IS the only way. It's a statement of fact, not desire. [[User:Hixie|Hixie]] 18:24, 4 December 2006 (UTC)<br />
* [http://planet.intertwingly.net/ my planet] is served as <code>application/xhtml+xml</code> to Firefox and <code>text/html</code> to IE. It seems to be capable of doing both scripting and style in both modes. -[[User:Rubys|Rubys]]<br />
<br />
=== Ideals ===<br />
<br />
In an ideal word:<br />
* the syntax of XML and HTML would be either complete identical or completely different.<br />
** The syntax of HTML and XHTML are completely different. The fact that they look similar on the surface is irrelevant. (see above). --[[User:Lachlan Hunt|Lachlan Hunt]] 04:43, 4 December 2006 (UTC)<br />
*** Completely? I'd say that they are as different as en-us and en-au. :-) -[[User:Rubys|Rubys]]<br />
* the set of DOM trees that could be serialized as XHTML and HTML would either be completely identical or completely different.<br />
** This is not possible without breaking backwards compatibility. These incompatibilities have existed between HTML and XHTML for a long time, and that hasn't stopped people serialising their XHTML as HTML up until now (for all practical purposes, serving XHTML as text/html is equivalent to reserialising). --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
* <code>Content-Type</code> would either always be respected, or always be ignored.<br />
** <code>Content-Type</code> is always respected for for HTML and XHTML MIME types. It's not for some others, but that's a different issue --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
*** Always? Try serving your feed as text/html to FireFox 2.0. -[[User:Rubys|Rubys]]<br />
*** Try serving your feed as text/html to *any browser* with feed support. [[User:Sayrer|Sayrer]]<br />
* there would either be a fool-proof way to "sniff" whether the a given content was HTML or XHTML; or there would be no difference between XHTML and HTML in terms of syntax and range of DOM trees that could validly be serialized would also be identical.<br />
** There is a foolproof way... the MIME type. :-) -Hixie<br />
<br />
=== Analysis ===<br />
<br />
Obviously, the current situation is less than ideal. XML and HTML evolved from a common ancestor. XML isn't changing. And the constraint to be as backwards compatible with HTML4 as humanly possible places practical limits on what can be done. Neither being absolutely identical with the XML syntax nor being completely different are options.<br />
<br />
At the present time, the HTML5 syntax is a (near) superset of the XHTML syntax. Yet the situation is (nearly) reversed for the set of DOM trees that can be serialized into XHTML is larger than the set of DOM trees that can be serialized into HTML5.<br />
<br />
Having the syntaxes being substantially similar leads to confusion in some edge cases (e.g., <code><p/></code>) but also has some advantages. Similar syntaxes would make things easier for people who have become disillusioned with XHTML and wish to migrate to HTML5. Conversely, similar syntaxes would make incremental migration from HTML5 to XHTML5 easier for those who wish to take advantage of the greater set of DOM trees that can be represented in that syntax.<br />
<br />
=== Potential Strategies ===<br />
<br />
'''Note''': these strategies are not necessarily mutually-exclusive.<br />
<br />
* Develop better tools and actively work to integrate them into products like WordPress and DreamWeaver. (We're doing this already. -Hixie)<br />
* The definition of HTML5 understandably and correctly puts a higher weight on HTML4 compatibility than XHTML migration. But as a migration aid, identify some unlikely/invalid combination (example: use of the HTML5 DOCTYPE combined with <code>xmlns</code> attribute on the <code>html</code> element combined with the use of a non-xml MIME type) and adjust some (as of yet undefined) set of the HTML5 parsing rules.<br />
* Document these differences, either in the spec itself (as a non-normative appendix?) and/or by having a conformance checker flag these differences. Variations:<br />
** Ensure that each of these differences triggers a [http://www.whatwg.org/specs/web-apps/current-work/#parse parse error] or equivalent in HTML5; this does not (necessarily) involve changing the recovery action or the way the document is ultimately parsed.<br />
** Instead of bothering people who may not care about these differences, identify some unlikely combination (such as the DOCTYPE/xmlns/MIME combination above) and have it trigger a '''pedantic mode''' which enables these additional checks.</div>Rubyshttps://wiki.whatwg.org/index.php?title=Talk:HTML_vs._XHTML&diff=2003Talk:HTML vs. XHTML2006-12-04T14:15:28Z<p>Rubys: /* Ideals */</p>
<hr />
<div>An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3. And that the proper way to tell the two apart is via MIME types.<br />
<br />
There are only two problems with that. XHTML is not as different from HTML as RDF/XML is from N3. And MIME types can't be relied on. Let's take each in turn.<br />
<br />
=== Syntax ===<br />
<br />
* Both N3 and RDF/XML are used to express sets of RDF triples. They are equally capable: every triple store can be dumped into either format. The analogy here is the DOM. It is not currently the case that every DOM tree can be dumped equally capably into either format.<br />
* N3 and RDF/XML are not the same, nor do they even look similar. They are different from top to bottom. Not only are no N3 documents valid RDF/XML, there are no individual triples that can be expressed the same way in both formats.<br />
** You need to explain how RDF/N3 is relevant! --[[User:Lachlan Hunt|Lachlan Hunt]] 04:43, 4 December 2006 (UTC)<br />
*** The top of this page starts with "An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3". Need I provide references? -[[User:Rubys|Rubys]] 14:08, 4 December 2006 (UTC)<br />
<br />
=== Mime Types ===<br />
<br />
* People have consistently proven that they can't be trusted to configure and set MIME types correctly. Most aren't even aware that MIME types exist. The default setup with Apache is to not allow overrides. One popular use case is for documentation that is served via <code>file:///</code> URIs directly from your hard disk.<br />
** <code>file:///</code> URIs use an OS or browser specific mechanism to determine the MIME. On Windows, for instance (for IE), the file extension is mapped to a MIME type via a key in the registry. --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
*** and as such, can rarely be depended upon. In addition to file extensions, content sniffing is also a common strategy. -[[User:Rubys|Rubys]] 14:12, 4 December 2006 (UTC)<br />
* HTTP as specified indicates that the the <code>Content-Type</code> header is authoritative - it trumps the XML prolog. HTTP as practiced treats the MIME type as a hint. Whether it be feeds or WMV files, users have an expectation as to what happens when they click on these links, and are unhappy when the browser lets them down.<br />
** For compatibility, those issues with several file formats do, unfortunately, have to be retained. However, breaking Content-Type in that way for <code>text/html</code> to somehow allow the content to be treated as XML instead is not an option. --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
<br />
=== Ideals ===<br />
<br />
In an ideal word:<br />
* the syntax of XML and HTML would be either complete identical or completely different.<br />
** The syntax of HTML and XHTML are completely different. The fact that they look similar on the surface is irrelevant. (see above). --[[User:Lachlan Hunt|Lachlan Hunt]] 04:43, 4 December 2006 (UTC)<br />
*** Completely? I'd say that they are as different as en-us and en-au. :-) -[[User:Rubys|Rubys]]<br />
* the set of DOM trees that could be serialized as XHTML and HTML would either be completely identical or completely different.<br />
** This is not possible without breaking backwards compatibility. These incompatibilities have existed between HTML and XHTML for a long time, and that hasn't stopped people serialising their XHTML as HTML up until now (for all practical purposes, serving XHTML as text/html is equivalent to reserialising). --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
* <code>Content-Type</code> would either always be respected, or always be ignored.<br />
** <code>Content-Type</code> is always respected for for HTML and XHTML MIME types. It's not for some others, but that's a different issue --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
*** Always? Try serving your feed as text/html to FireFox 2.0. -[[User:Rubys|Rubys]]<br />
* there would either be a fool-proof way to "sniff" whether the a given content was HTML or XHTML; or there would be no difference between XHTML and HTML in terms of syntax and range of DOM trees that could validly be serialized would also be identical.<br />
** There is a foolproof way... the MIME type. :-) -Hixie<br />
<br />
=== Analysis ===<br />
<br />
Obviously, the current situation is less than ideal. XML and HTML evolved from a common ancestor. XML isn't changing. And the constraint to be as backwards compatible with HTML4 as humanly possible places practical limits on what can be done. Neither being absolutely identical with the XML syntax nor being completely different are options.<br />
<br />
At the present time, the HTML5 syntax is a (near) superset of the XHTML syntax. Yet the situation is (nearly) reversed for the set of DOM trees that can be serialized into XHTML is larger than the set of DOM trees that can be serialized into HTML5.<br />
<br />
Having the syntaxes being substantially similar leads to confusion in some edge cases (e.g., <code><p/></code>) but also has some advantages. Similar syntaxes would make things easier for people who have become disillusioned with XHTML and wish to migrate to HTML5. Conversely, similar syntaxes would make incremental migration from HTML5 to XHTML5 easier for those who wish to take advantage of the greater set of DOM trees that can be represented in that syntax.<br />
<br />
=== Potential Strategies ===<br />
<br />
'''Note''': these strategies are not necessarily mutually-exclusive.<br />
<br />
* Develop better tools and actively work to integrate them into products like WordPress and DreamWeaver. (We're doing this already. -Hixie)<br />
* The definition of HTML5 understandably and correctly puts a higher weight on HTML4 compatibility than XHTML migration. But as a migration aid, identify some unlikely/invalid combination (example: use of the HTML5 DOCTYPE combined with <code>xmlns</code> attribute on the <code>html</code> element combined with the use of a non-xml MIME type) and adjust some (as of yet undefined) set of the HTML5 parsing rules.<br />
* Document these differences, either in the spec itself (as a non-normative appendix?) and/or by having a conformance checker flag these differences. Variations:<br />
** Ensure that each of these differences triggers a [http://www.whatwg.org/specs/web-apps/current-work/#parse parse error] or equivalent in HTML5; this does not (necessarily) involve changing the recovery action or the way the document is ultimately parsed.<br />
** Instead of bothering people who may not care about these differences, identify some unlikely combination (such as the DOCTYPE/xmlns/MIME combination above) and have it trigger a '''pedantic mode''' which enables these additional checks.</div>Rubyshttps://wiki.whatwg.org/index.php?title=Talk:HTML_vs._XHTML&diff=2002Talk:HTML vs. XHTML2006-12-04T14:12:23Z<p>Rubys: /* Mime Types */</p>
<hr />
<div>An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3. And that the proper way to tell the two apart is via MIME types.<br />
<br />
There are only two problems with that. XHTML is not as different from HTML as RDF/XML is from N3. And MIME types can't be relied on. Let's take each in turn.<br />
<br />
=== Syntax ===<br />
<br />
* Both N3 and RDF/XML are used to express sets of RDF triples. They are equally capable: every triple store can be dumped into either format. The analogy here is the DOM. It is not currently the case that every DOM tree can be dumped equally capably into either format.<br />
* N3 and RDF/XML are not the same, nor do they even look similar. They are different from top to bottom. Not only are no N3 documents valid RDF/XML, there are no individual triples that can be expressed the same way in both formats.<br />
** You need to explain how RDF/N3 is relevant! --[[User:Lachlan Hunt|Lachlan Hunt]] 04:43, 4 December 2006 (UTC)<br />
*** The top of this page starts with "An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3". Need I provide references? -[[User:Rubys|Rubys]] 14:08, 4 December 2006 (UTC)<br />
<br />
=== Mime Types ===<br />
<br />
* People have consistently proven that they can't be trusted to configure and set MIME types correctly. Most aren't even aware that MIME types exist. The default setup with Apache is to not allow overrides. One popular use case is for documentation that is served via <code>file:///</code> URIs directly from your hard disk.<br />
** <code>file:///</code> URIs use an OS or browser specific mechanism to determine the MIME. On Windows, for instance (for IE), the file extension is mapped to a MIME type via a key in the registry. --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
*** and as such, can rarely be depended upon. In addition to file extensions, content sniffing is also a common strategy. -[[User:Rubys|Rubys]] 14:12, 4 December 2006 (UTC)<br />
* HTTP as specified indicates that the the <code>Content-Type</code> header is authoritative - it trumps the XML prolog. HTTP as practiced treats the MIME type as a hint. Whether it be feeds or WMV files, users have an expectation as to what happens when they click on these links, and are unhappy when the browser lets them down.<br />
** For compatibility, those issues with several file formats do, unfortunately, have to be retained. However, breaking Content-Type in that way for <code>text/html</code> to somehow allow the content to be treated as XML instead is not an option. --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
<br />
=== Ideals ===<br />
<br />
In an ideal word:<br />
* the syntax of XML and HTML would be either complete identical or completely different.<br />
** The syntax of HTML and XHTML are completely different. The fact that they look similar on the surface is irrelevant. (see above). --[[User:Lachlan Hunt|Lachlan Hunt]] 04:43, 4 December 2006 (UTC)<br />
* the set of DOM trees that could be serialized as XHTML and HTML would either be completely identical or completely different.<br />
** This is not possible without breaking backwards compatibility. These incompatibilities have existed between HTML and XHTML for a long time, and that hasn't stopped people serialising their XHTML as HTML up until now (for all practical purposes, serving XHTML as text/html is equivalent to reserialising). --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
* <code>Content-Type</code> would either always be respected, or always be ignored.<br />
** <code>Content-Type</code> is always respected for for HTML and XHTML MIME types. It's not for some others, but that's a different issue --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
* there would either be a fool-proof way to "sniff" whether the a given content was HTML or XHTML; or there would be no difference between XHTML and HTML in terms of syntax and range of DOM trees that could validly be serialized would also be identical.<br />
** There is a foolproof way... the MIME type. :-) -Hixie<br />
<br />
=== Analysis ===<br />
<br />
Obviously, the current situation is less than ideal. XML and HTML evolved from a common ancestor. XML isn't changing. And the constraint to be as backwards compatible with HTML4 as humanly possible places practical limits on what can be done. Neither being absolutely identical with the XML syntax nor being completely different are options.<br />
<br />
At the present time, the HTML5 syntax is a (near) superset of the XHTML syntax. Yet the situation is (nearly) reversed for the set of DOM trees that can be serialized into XHTML is larger than the set of DOM trees that can be serialized into HTML5.<br />
<br />
Having the syntaxes being substantially similar leads to confusion in some edge cases (e.g., <code><p/></code>) but also has some advantages. Similar syntaxes would make things easier for people who have become disillusioned with XHTML and wish to migrate to HTML5. Conversely, similar syntaxes would make incremental migration from HTML5 to XHTML5 easier for those who wish to take advantage of the greater set of DOM trees that can be represented in that syntax.<br />
<br />
=== Potential Strategies ===<br />
<br />
'''Note''': these strategies are not necessarily mutually-exclusive.<br />
<br />
* Develop better tools and actively work to integrate them into products like WordPress and DreamWeaver. (We're doing this already. -Hixie)<br />
* The definition of HTML5 understandably and correctly puts a higher weight on HTML4 compatibility than XHTML migration. But as a migration aid, identify some unlikely/invalid combination (example: use of the HTML5 DOCTYPE combined with <code>xmlns</code> attribute on the <code>html</code> element combined with the use of a non-xml MIME type) and adjust some (as of yet undefined) set of the HTML5 parsing rules.<br />
* Document these differences, either in the spec itself (as a non-normative appendix?) and/or by having a conformance checker flag these differences. Variations:<br />
** Ensure that each of these differences triggers a [http://www.whatwg.org/specs/web-apps/current-work/#parse parse error] or equivalent in HTML5; this does not (necessarily) involve changing the recovery action or the way the document is ultimately parsed.<br />
** Instead of bothering people who may not care about these differences, identify some unlikely combination (such as the DOCTYPE/xmlns/MIME combination above) and have it trigger a '''pedantic mode''' which enables these additional checks.</div>Rubyshttps://wiki.whatwg.org/index.php?title=Talk:HTML_vs._XHTML&diff=2001Talk:HTML vs. XHTML2006-12-04T14:08:45Z<p>Rubys: /* Syntax */</p>
<hr />
<div>An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3. And that the proper way to tell the two apart is via MIME types.<br />
<br />
There are only two problems with that. XHTML is not as different from HTML as RDF/XML is from N3. And MIME types can't be relied on. Let's take each in turn.<br />
<br />
=== Syntax ===<br />
<br />
* Both N3 and RDF/XML are used to express sets of RDF triples. They are equally capable: every triple store can be dumped into either format. The analogy here is the DOM. It is not currently the case that every DOM tree can be dumped equally capably into either format.<br />
* N3 and RDF/XML are not the same, nor do they even look similar. They are different from top to bottom. Not only are no N3 documents valid RDF/XML, there are no individual triples that can be expressed the same way in both formats.<br />
** You need to explain how RDF/N3 is relevant! --[[User:Lachlan Hunt|Lachlan Hunt]] 04:43, 4 December 2006 (UTC)<br />
*** The top of this page starts with "An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3". Need I provide references? -[[User:Rubys|Rubys]] 14:08, 4 December 2006 (UTC)<br />
<br />
=== Mime Types ===<br />
<br />
* People have consistently proven that they can't be trusted to configure and set MIME types correctly. Most aren't even aware that MIME types exist. The default setup with Apache is to not allow overrides. One popular use case is for documentation that is served via <code>file:///</code> URIs directly from your hard disk.<br />
** <code>file:///</code> URIs use an OS or browser specific mechanism to determine the MIME. On Windows, for instance (for IE), the file extension is mapped to a MIME type via a key in the registry. --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
* HTTP as specified indicates that the the <code>Content-Type</code> header is authoritative - it trumps the XML prolog. HTTP as practiced treats the MIME type as a hint. Whether it be feeds or WMV files, users have an expectation as to what happens when they click on these links, and are unhappy when the browser lets them down.<br />
** For compatibility, those issues with several file formats do, unfortunately, have to be retained. However, breaking Content-Type in that way for <code>text/html</code> to somehow allow the content to be treated as XML instead is not an option. --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
<br />
=== Ideals ===<br />
<br />
In an ideal word:<br />
* the syntax of XML and HTML would be either complete identical or completely different.<br />
** The syntax of HTML and XHTML are completely different. The fact that they look similar on the surface is irrelevant. (see above). --[[User:Lachlan Hunt|Lachlan Hunt]] 04:43, 4 December 2006 (UTC)<br />
* the set of DOM trees that could be serialized as XHTML and HTML would either be completely identical or completely different.<br />
** This is not possible without breaking backwards compatibility. These incompatibilities have existed between HTML and XHTML for a long time, and that hasn't stopped people serialising their XHTML as HTML up until now (for all practical purposes, serving XHTML as text/html is equivalent to reserialising). --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
* <code>Content-Type</code> would either always be respected, or always be ignored.<br />
** <code>Content-Type</code> is always respected for for HTML and XHTML MIME types. It's not for some others, but that's a different issue --[[User:Lachlan Hunt|Lachlan Hunt]] 11:16, 4 December 2006 (UTC)<br />
* there would either be a fool-proof way to "sniff" whether the a given content was HTML or XHTML; or there would be no difference between XHTML and HTML in terms of syntax and range of DOM trees that could validly be serialized would also be identical.<br />
** There is a foolproof way... the MIME type. :-) -Hixie<br />
<br />
=== Analysis ===<br />
<br />
Obviously, the current situation is less than ideal. XML and HTML evolved from a common ancestor. XML isn't changing. And the constraint to be as backwards compatible with HTML4 as humanly possible places practical limits on what can be done. Neither being absolutely identical with the XML syntax nor being completely different are options.<br />
<br />
At the present time, the HTML5 syntax is a (near) superset of the XHTML syntax. Yet the situation is (nearly) reversed for the set of DOM trees that can be serialized into XHTML is larger than the set of DOM trees that can be serialized into HTML5.<br />
<br />
Having the syntaxes being substantially similar leads to confusion in some edge cases (e.g., <code><p/></code>) but also has some advantages. Similar syntaxes would make things easier for people who have become disillusioned with XHTML and wish to migrate to HTML5. Conversely, similar syntaxes would make incremental migration from HTML5 to XHTML5 easier for those who wish to take advantage of the greater set of DOM trees that can be represented in that syntax.<br />
<br />
=== Potential Strategies ===<br />
<br />
'''Note''': these strategies are not necessarily mutually-exclusive.<br />
<br />
* Develop better tools and actively work to integrate them into products like WordPress and DreamWeaver. (We're doing this already. -Hixie)<br />
* The definition of HTML5 understandably and correctly puts a higher weight on HTML4 compatibility than XHTML migration. But as a migration aid, identify some unlikely/invalid combination (example: use of the HTML5 DOCTYPE combined with <code>xmlns</code> attribute on the <code>html</code> element combined with the use of a non-xml MIME type) and adjust some (as of yet undefined) set of the HTML5 parsing rules.<br />
* Document these differences, either in the spec itself (as a non-normative appendix?) and/or by having a conformance checker flag these differences. Variations:<br />
** Ensure that each of these differences triggers a [http://www.whatwg.org/specs/web-apps/current-work/#parse parse error] or equivalent in HTML5; this does not (necessarily) involve changing the recovery action or the way the document is ultimately parsed.<br />
** Instead of bothering people who may not care about these differences, identify some unlikely combination (such as the DOCTYPE/xmlns/MIME combination above) and have it trigger a '''pedantic mode''' which enables these additional checks.</div>Rubyshttps://wiki.whatwg.org/index.php?title=HTML_vs._XHTML&diff=1997HTML vs. XHTML2006-12-04T10:33:45Z<p>Rubys: /* Potential Strategies */</p>
<hr />
<div>== Differences Between HTML and XHTML ==<br />
<br />
Although HTML and XHTML appear to have similarities in their syntax, they are significantly different in many ways.<br />
<br />
'''Note''': As the current WHATWG document is a draft, this section will need to track to a moving target.<br />
Differences marked @@@ are differences that could theoretically be changed without affecting backwards compatibility.<br />
<br />
=== MIME Types ===<br />
<br />
* XHTML must be served with an XML MIME type, such as <code>application/xml</code> or <code>application/xhtml+xml</code>.<br />
* HTML must be served as <code>text/html</code>.<br />
<br />
It is the MIME type that determines what type of document you are using. If you use attempt to send XHTML as <code>text/html</code>, you are actually just using HTML, possibly with syntax errors.<br />
<br />
=== Parsing ===<br />
<br />
XHTML uses XML parsing requirements. HTML uses its own which are defined much more closely to the way browsers actually handle HTML today.<br />
<br />
* In XHTML, well-formedness errors are fatal. In HTML, error handling rules are much more graceful. Well-formedness errors, which are also syntax errors in HTML, include the following:<br />
** Unencoded ampersands (<code>&amp;</code>) and less than signs (<code>&lt;</code>) (This does not apply to <code>CDATA</code>).<br />
** Comments containing extra pairs of hyphens or ending with a hyphen. e.g.<br />
*** <code>&lt;!--<var> syntax -- error </var>--&gt;</code> or<br />
*** <code>&lt;!--<var> syntax error -</var>--&gt;</code>.<br />
** Mismatched end tags (does not apply to elements with optional tags) <br />
** Unclosed tags.<br />
** Unexpected characters occuring in or before attribute names.<br />
** Unexpected occurrence of EOF.<br />
** Unexpected characters before the DOCTYPE name.<br />
** Missing DOCTYPE name.<br />
** A <code>PUBLIC</code> identifer in a <code>DOCTYPE</code> without a <code>SYSTEM</code> identifier (Note: including either of these is a syntax error in HTML5; but, in XML only the <code>SYSTEM</code> identifier is allowed to occur on its own).<br />
** End tags with attributes. <br />
** Unexpected end tags (in HTML, an unexpected <code>&lt;/br></code> or <code>&lt;/p></code> can cause the start tag to be implied before it).<br />
* The internal subset is permitted in XML, but meaningless (and forbidden) in HTML.<br />
** In some cases, an internal subset in HTML would end up being partly rendered inline.<br />
* The sequence of characters &quot;<code>]]&gt;</code>&quot; when it does not mark the end of a <code>CDATA</code> section is a well-formedness error in XHTML, but valid in HTML.<br />
* In XHTML: <code>&lt;![CDATA[...]]&gt;</code> is a <code>CDATA</code> section. In HTML, it's a bogus comment.<br />
* In XHTML, <code>&lt;?foo ...?&gt;</code> is a processing instruction. In HTML, it's a bogus comment.<br />
* In HTML, the trailing slash used for the empty element syntax is a parse error for non-void elements (see below), but is ignored in all cases.<br />
* In HTML, the <code>script</code> and <code>style</code> elements are parsed as <code>CDATA</code>. (Note: the definition of <code>CDATA</code> differs from that in XML). In XML, they're parsed as normal elements (which means that comments are treated as <em>real</em> comments, and things that look like start tags actually are start tags).<br />
* In HTML, the <code>title</code> and <code>textarea</code> elements are parsed as <code>RCDATA</code>. (Note: The definition of <code>RCDATA</code> differs from that in SGML and there is no <code>RCDATA</code> in XML).<br />
* In HTML, if scripting is enabled, the <code>noscript</code> element is parsed as <code>CDATA</code>. If scripting is disabled, it's parsed as <code>PCDATA</code>. In XHTML, the element has no effect, and can't really be used to stop content from being present when script is disabled.<br />
* In HTML, the <code>iframe</code>, <code>noembed</code> and <code>noframes</code> elements are parsed as <code>CDATA</code>. In XHTML, they are parsed as normal elements, and therefore do not stop content from being used.<br />
* White space characters in attribute values are [http://www.w3.org/TR/REC-xml/#AVNormalize normalized] to spaces in XHTML.<br />
* Elements with optional tags are implied in certain conditions.<br />
* In HTML, <code>base</code>, <code>link</code>, <code>meta</code>, <code>style</code> and <code>title</code> elements with tags occurring in the body are moved inserted into the head. In XHTML, they stay where they were specified.<br />
* In HTML, tags for certain elements, which appear out of context, are ignored. This includes <code>caption</code>, <code>col</code>, <code>colgroup</code>, <code>frame</code>, <code>frameset</code>, <code>head</code>, <code>option</code>, <code>optgroup</code>, <code>tbody</code>, <code>td</code>, <code>tfoot</code>, <code>th</code>, <code>thead</code>, <code>tr</code>.<br />
* The <code>plaintext</code> element has a special parsing requirement in HTML. (it is, however, forbidden). <br />
* <em>Many other special handling of edge cases and error conditions, not all of which are listed here, occur in HTML.</em><br />
<br />
=== Syntax ===<br />
<br />
* In HTML, [http://blog.whatwg.org/faq/#doctype the <code>doctype</code> is required]. In XHTML, it is optional.<br />
* In XHTML, tag names and attribute names are case sensitive. In HTML, they are case insensitive.<br />
* In XHTML, non-empty elements require both a start and an end tag. In HTML, certain elements allow the omission of either or both:<br />
** <code>html</code> (both)<br />
** <code>head</code> (both)<br />
** <code>body</code> (both)<br />
** <code>li</code> (end tag)<br />
** <code>dt</code> (end tag)<br />
** <code>dd</code> (end tag)<br />
** <code>p</code> (end tag)<br />
** <code>colgroup</code> (both)<br />
** <code>thead</code> (end tag)<br />
** <code>tbody</code> (both)<br />
** <code>tfoot</code> (end tag)<br />
** <code>tr</code> (end tag)<br />
** <code>td</code> (end tag)<br />
** <code>th</code> (end tag)<br />
* In XHTML, empty elements may use either the empty element syntax (<code>&lt;br/&gt;</code>) or have an end tag immediately follow the start tag (<code>&lt;br&gt;&lt;/br&gt;</code>). In HTML, the empty element syntax (trailing slash) is allowed on void elements, but forbidden on other elements. However, it serves no purpose whatsoever and can be omitted. End tags for void elements are forbidden.<br />
** <code>base</code>,<code> link</code>, <code>meta</code>, <code>hr</code>, <code>br</code>, <code>img</code>, <code>embed</code>, <code>param</code>, <code>area</code>, <code>col</code> and <code>input</code><br />
** Note: the following are treated as void elements for the purpose in the parsing requirements, but, as they are obsolete and non-standard, the trailing slash is not permitted: <code>basefont</code>, <code>b</code><code>gsound</code>, <code>spacer</code>, <code>wbr</code>. (although, since these elements are not permitted anyway, it doesn't make much difference).<br />
* HTML allows attribute minimisation (i.e. omitting the value), XHTML does not.<br />
* HTML allows the use of unquoted attribute values, XHTML does not.<br />
* XHTML allows the use of <code>CDATA</code> sections, HTML does not.<br />
* XHTML allows the use of processing instructions, HTML does not.<br />
* In HTML, all entity references are predefined and do not require a DTD. But because there is no DTD for XHTML5, entity references cannot be used in XHTML. (excluding the 5 predefined entities: <code>&amp;amp;</code>, <code>&amp;lt;</code>, <code>&amp;gt;</code>, <code>&amp;quot;</code> and <code>&amp;apos;)</code><br />
** You may provide your own DTD for use with your own validating parser, but be aware that browsers do not use validating parsers and will not read the DTD.<br />
* The valid set of unicode characters in XML 1.0 is limited beyond that in HTML.<br />
* Namespace prefixes are permitted in XHTML. They are forbidden in HTML. <br />
<br />
=== Markup ===<br />
<br />
* The [http://blog.whatwg.org/faq/#namespace-decl namespace declaration] (<code>xmlns</code> attribute) is required in XHTML. The xmlns attribute is also allowed to appear on the <code>html</code> element in HTML on the condition that is has the value <code><nowiki>"http://www.w3.org/1999/xhtml"</nowiki></code>.<br />
** <code><nowiki>&lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;</nowiki></code><br />
** In HTML, the xmlns attribute has absolutely no effect. It is basically a talisman. It is allowed merely to make migration to and from XHTML mildly easier. When parsed by an HTML parser, the attribute ends up in the null namespace<br />
** In XML, an xmlns attribute is part of the namespace declaration mechanism, and an element cannot actually have an xmlns attribute in the null namespace. In DOM implementations, the attribute ends up in the "<code><nowiki>http://www.w3.org/2000/xmlns/</nowiki></code>" namespace.<br />
* XHTML allows non XHTML elements and attributes (in different namespaces) to be used, HTML does not.<br />
* XHTML uses the <code>xml:lang</code> attribute, HTML uses <code>lang</code> instead,<br />
* XML ID introduces <code>xml:id</code>, which could be used in XHTML. In HTML it has no effect.<br />
* In HTML, the <code>noscript</code> element may be used. In XHTML, it is forbidden.<br />
* HTML uses the <code>base</code> element, XHTML uses <code>xml:base</code> instead. <br />
* In XHTML, <code>p</code> elements may contain structured inline level elements including <code>blockquote</code>, <code>dl</code>, <code>menu</code>, <code>ol</code>, <code>ul</code>, <code>pre</code> and <code>table</code>. In the HTML serialisation, due to backwards compatibility constraints, this is not possible (though it may be done through DOM manipulation).<br />
* In XHTML, <code>table</code> elements may contain child <code>tr</code> elements. In the HTML serialisation, due to backwards compatibility constraints, this is not possible (though it may be done through DOM manipulation).<br />
<br />
=== Character Encoding ===<br />
<br />
* In XHTML, the XML declaration may be used to [http://blog.whatwg.org/faq/#charset specify the character encoding]. In HTML, the xml declaration is forbidden<br />
* In HTML, the <code>meta</code> element may be used insted. The <code>http-equiv</code> attribute on the <code>meta</code> element is forbidden in XHTML and is ignored if included.<br />
* The default character encoding for XHTML is, according to XML rules, <code>UTF-8</code> or <code>UTF-16</code>. If the encoding is unspecified in HTML, it should be determined through implementation specific heuristics or fallback to a default value (Note: this section of the spec is not yet finished).<br />
<br />
=== Scripts ===<br />
<br />
* <code>document.write()</code> and <code>document.writeln()</code> cannot be used in XHTML, they can in HTML. <br />
* In XHTML, the use of the <code>innerHTML</code> property requires that the string be a well-formed fragment of XML. <br />
* DOM APIs are case sensitive in XHTML and some are case insensitive in HTML. (This does not apply to elements which are not in the HTML namespace)<br />
** Element.tagName, Node.nodeName, and Node.localName return the value in uppercase.<br />
** Document.createElement() is case insensitive (the canonical form is lowercase).<br />
** Element.setAttributeNode() will change the attribute name to lowercase.<br />
** Element.setAttribute() is case insensitive (the canonical form is lowercase).<br />
** Document.getElementsByTagName() and Element.getElementsByTagName() are case insensitive.<br />
** Document.renameNode(). If the new namespace is the HTML namespace, then the new qualified name must be lowercased before the rename takes place.<br />
<br />
=== Stylesheets ===<br />
<br />
* Selectors, as used in CSS, match case sensitively in XHTML, but case insensitively in HTML.<br />
* CSS requires special handling of the body element in HTML for painting backgrounds on the canvas, which do not apply to XHTML.<br />
<br />
== Other Information ==<br />
<br />
'''Note: This section should probably be removed, tidied up or moved to the discussion page.'''<br />
<br />
An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3. And that the proper way to tell the two apart is via MIME types.<br />
<br />
There are only two problems with that. XHTML is not as different from HTML as RDF/XML is from N3. And MIME types can't be relied on. Let's take each in turn.<br />
<br />
=== Syntax ===<br />
<br />
* Both N3 and RDF/XML are used to express sets of RDF triples. They are equally capable: every triple store can be dumped into either format. The analogy here is the DOM. It is not currently the case that every DOM tree can be dumped equally capably into either format.<br />
* N3 and RDF/XML are not the same, nor do they even look similar. They are different from top to bottom. Not only are no N3 documents valid RDF/XML, there are no individual triples that can be expressed the same way in both formats.<br />
<br />
Need to explain how RDF/N3 is relevant! --[[User:Lachlan Hunt|Lachlan Hunt]] 04:43, 4 December 2006 (UTC)<br />
<br />
=== Mime Types ===<br />
<br />
* People have consistently proven that they can't be trusted to configure and set MIME types correctly. Most aren't even aware that MIME types exist. The default setup with Apache is to not allow overrides. One popular use case is for documentation that is served via <code>file:///</code> URIs directly from your hard disk.<br />
* HTTP as specified indicates that the the <code>Content-Type</code> header is authoritative - it trumps the XML prolog. HTTP as practiced treats the MIME type as a hint. Whether it be feeds or WMV files, users have an expectation as to what happens when they click on these links, and are unhappy when the browser lets them down.<br />
<br />
=== Ideals ===<br />
<br />
In an ideal word:<br />
* the syntax of XML and HTML would be either complete identical or completely different.<br />
** The syntax of HTML and XHTML are completely different. The fact that they look similar on the surface is irrelevant. (see above). --[[User:Lachlan Hunt|Lachlan Hunt]] 04:43, 4 December 2006 (UTC)<br />
* the set of DOM trees that could be serialized as XHTML and HTML would either be completely identical or completely different.<br />
* <code>Content-Type</code> would either aways be respected, or always be ignored.<br />
* there would either be a fool-proof way to "sniff" whether the a given content was HTML or XHTML; or there would be no difference between XHTML and HTML in terms of syntax and range of DOM trees that could validly be serialized would also be identical.<br />
** There is a foolproof way... the MIME type. :-) -Hixie<br />
<br />
=== Analysis ===<br />
<br />
Obviously, the current situation is less than ideal. XML and HTML evolved from a common ancestor. XML isn't changing. And the constraint to be as backwards compatible with HTML4 as humanly possible places practical limits on what can be done. Neither being absolutely identical with the XML syntax nor being completely different are options.<br />
<br />
At the present time, the HTML5 syntax is a (near) superset of the XHTML syntax. Yet the situation is (nearly) reversed for the set of DOM trees that can be serialized into XHTML is larger than the set of DOM trees that can be serialized into HTML5.<br />
<br />
Having the syntaxes being substantially similar leads to confusion in some edge cases (e.g., <code><p/></code>) but also has some advantages. Similar syntaxes would make things easier for people who have become disillusioned with XHTML and wish to migrate to HTML5. Conversely, similar syntaxes would make incremental migration from HTML5 to XHTML5 easier for those who wish to take advantage of the greater set of DOM trees that can be represented in that syntax.<br />
<br />
=== Potential Strategies ===<br />
<br />
'''Note''': these strategies are not necessarily mutually-exclusive.<br />
<br />
* Develop better tools and actively work to integrate them into products like WordPress and DreamWeaver. (We're doing this already. -Hixie)<br />
* The definition of HTML5 understandably and correctly puts a higher weight on HTML4 compatibility than XHTML migration. But as a migration aid, identify some unlikely/invalid combination (example: use of the HTML5 DOCTYPE combined with <code>xmlns</code> attribute on the <code>html</code> element combined with the use of a non-xml MIME type) and adjust some (as of yet undefined) set of the HTML5 parsing rules.<br />
* Document these differences, either in the spec itself (as a non-normative appendix?) and/or by having a conformance checker flag these differences. Variations:<br />
** Ensure that each of these differences triggers a [http://www.whatwg.org/specs/web-apps/current-work/#parse parse error] or equivalent in HTML5; this does not (necessarily) involve changing the recovery action or the way the document is ultimately parsed.<br />
** Instead of bothering people who may not care about these differences, identify some unlikely combination (such as the DOCTYPE/xmlns/MIME combination above) and have it trigger a '''pedantic mode''' which enables these additional checks.</div>Rubyshttps://wiki.whatwg.org/index.php?title=HTML_vs._XHTML&diff=1976HTML vs. XHTML2006-12-04T00:10:07Z<p>Rubys: gut incoherent recommendation</p>
<hr />
<div>An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3. And that the proper way to tell the two apart is via MIME types.<br />
<br />
There are only two problems with that. XHTML is not as different from HTML as RDF/XML is from N3. And MIME types can't be relied on. Let's take each in turn.<br />
<br />
=== Syntax ===<br />
<br />
* Both N3 and RDF/XML are used to express sets of RDF triples. They are equally capable: every triple store can be dumped into either format. The analogy here is the DOM. It is not currently the case that every DOM tree can be dumped equally capably into either format.<br />
* N3 and RDF/XML are not the same, nor do they even look similar. They are different from top to bottom. Not only are no N3 documents valid RDF/XML, there are no individual triples that can be expressed the same way in both formats.<br />
<br />
=== Mime Types ===<br />
<br />
* People have consistently proven that they can't be trusted to configure and set MIME types correctly. Most aren't even aware that MIME types exist. The default setup with Apache is to not allow overrides. One popular use case is for documentation that is served via <tt>file:///</tt> URIs directly from your hard disk.<br />
* HTTP as specified indicates that the the <tt>Content-Type</tt> header is authoritative - it trumps the XML prolog. HTTP as practiced treats the MIME type as a hint. Whether it be feeds or WMV files, users have an expectation as to what happens when they click on these links, and are unhappy when the browser lets them down.<br />
<br />
=== Ideals ===<br />
<br />
In an ideal word:<br />
* the syntax of XML and HTML would be either complete identical or completely different.<br />
* the set of DOM trees that could be serialized as XHTML and HTML would either be completely identical or completely different.<br />
* <tt>Content-Type</tt> would either aways be respected, or always be ignored.<br />
* there would either be a fool-proof way to "sniff" whether the a given content was HTML or XHTML; or there would be no difference between XHTML and HTML in terms of syntax and range of DOM trees that could validly be serialized would also be identical.<br />
<br />
=== Analysis ===<br />
<br />
Obviously, the current situation is less than ideal. XML and HTML evolved from a common ancestor. XML isn't changing. And the constraint to be as backwards compatible with HTML4 as humanly possible places practical limits on what can be done. Neither being absolutely identical with the XML syntax nor being completely different are options.<br />
<br />
At the present time, the HTML5 syntax is a (near) superset of the XHTML syntax. Yet the situation is (nearly) reversed for the set of DOM trees that can be serialized into XHTML is larger than the set of DOM trees that can be serialized into HTML5.<br />
<br />
Having the syntaxes being substantially similar leads to confusion in some edge cases (e.g., <tt><p/></tt>) but also has some advantages. Similar syntaxes would make things easier for people who have become disillusioned with XHTML and wish to migrate to HTML5. Conversely, similar syntaxes would make incremental migration from HTML5 to XHTML5 easier for those who wish to take advantage of the greater set of DOM trees that can be represented in that syntax.<br />
<br />
=== Known differences ===<br />
<br />
'''Note''': As the current WHATWG document is a draft, this section will need to track to a moving target.<br />
<br />
* A namespace declaration is required in XHTML5, and not allowed in HTML5.<br />
* A <!DOCTYPE> is required in HTML5, optional in XHTML5.<br />
* Empty element syntax is a parse error for non-void elements in HTML5, furthermore the prescribed error recovery for this parse error differs from the XML parsing rules.<br />
* XHTML5 is a XML 1.0 vocabulary. The valid set of unicode characters is limited in XML 1.0, more so than in HTML4/5.<br />
* XML processes are, by design, unforgiving of parse errors.<br />
* You can omit end tags for a number of elements in HTML, but not XML.<br />
* You can use <![CDATA[ ... ]]> syntax of XML, but it means something else in HTML.<br />
* You can use PIs in XML, but not in HTML.<br />
* The set of known named characters entity references can be depended upon in HTML, but not in XHTML (beyond the [http://www.w3.org/TR/REC-xml/#sec-predefined-ent 5 predefined ones]).<br />
* You can omit quotes on attribute values in HTML, but not in XML.<br />
* If you forget to escape "&" or "<" characters in HTML, the error handling is different than in XML.<br />
* You can include non-HTML elements and attributes in XML, but not in HTML.<br />
* <noscript> works in HTML, not XML.<br />
* <iframe> fallback content is parsed as text in HTML, but as markup in XML.<br />
* Comments with "--" in them work in HTML, but fail in XML.<br />
* The DTD internal subset works in XML, but is ignored (or worse) in HTML.<br />
* HTML syntax is case-insensitive, XML syntax is case-sensitive.<br />
* DOM APIs are case-sensitive in XML, case-insensitive in HTML.<br />
* CSS is case-sensitive in XML, case-insensitive in HTML.<br />
* You can use namespace prefixes in XML, not in HTML.<br />
* The contents of <script> and <style> elements in XML are parsed differently than in HTML.<br />
* document.write() works in HTML but not in XML.<br />
* Things that look like XML comments are treated as XML comments in XHTML —even inside script or style elements.<br />
* <tt>meta</tt> tags are not examined for character encoding information.<br />
* <tt>White space</tt> characters in attribute values are [http://www.w3.org/TR/REC-xml/#AVNormalize normalized] to spaces in XHTML.<br />
* xml:lang and xml:base are valid in XHTML, but not in HTML<br />
<br />
=== Potential Strategies ===<br />
<br />
'''Note''': these strategies are not necessarily mutually-exclusive.<br />
<br />
* Develop better tools and actively work to integrate them into products like WordPress and DreamWeaver. (We're doing this already. -Hixie)<br />
* The definition of HTML5 understandably and correctly puts a higher weight on HTML4 compatibility than XHTML migration. But as a migration aid, identify some unlikely/invalid combination (example: use of the HTML5 DOCTYPE combined with <tt>xmlns</tt> attribute on the <tt>html</tt> element combined with the use of a non-xml MIME type) and adjust some (as of yet undefined) set of the HTML5 parsing rules.</div>Rubyshttps://wiki.whatwg.org/index.php?title=HTML_vs._XHTML&diff=1975HTML vs. XHTML2006-12-03T23:59:17Z<p>Rubys: Additional differences (many from http://www.mozilla.org/docs/web-developer/faq.html#accept)</p>
<hr />
<div>An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3. And that the proper way to tell the two apart is via MIME types.<br />
<br />
There are only two problems with that. XHTML is not as different from HTML as RDF/XML is from N3. And MIME types can't be relied on. Let's take each in turn.<br />
<br />
=== Syntax ===<br />
<br />
* Both N3 and RDF/XML are used to express sets of RDF triples. They are equally capable: every triple store can be dumped into either format. The analogy here is the DOM. It is not currently the case that every DOM tree can be dumped equally capably into either format.<br />
* N3 and RDF/XML are not the same, nor do they even look similar. They are different from top to bottom. Not only are no N3 documents valid RDF/XML, there are no individual triples that can be expressed the same way in both formats.<br />
<br />
=== Mime Types ===<br />
<br />
* People have consistently proven that they can't be trusted to configure and set MIME types correctly. Most aren't even aware that MIME types exist. The default setup with Apache is to not allow overrides. One popular use case is for documentation that is served via <tt>file:///</tt> URIs directly from your hard disk.<br />
* HTTP as specified indicates that the the <tt>Content-Type</tt> header is authoritative - it trumps the XML prolog. HTTP as practiced treats the MIME type as a hint. Whether it be feeds or WMV files, users have an expectation as to what happens when they click on these links, and are unhappy when the browser lets them down.<br />
<br />
=== Ideals ===<br />
<br />
In an ideal word:<br />
* the syntax of XML and HTML would be either complete identical or completely different.<br />
* the set of DOM trees that could be serialized as XHTML and HTML would either be completely identical or completely different.<br />
* <tt>Content-Type</tt> would either aways be respected, or always be ignored.<br />
* there would either be a fool-proof way to "sniff" whether the a given content was HTML or XHTML; or there would be no difference between XHTML and HTML in terms of syntax and range of DOM trees that could validly be serialized would also be identical.<br />
<br />
=== Analysis ===<br />
<br />
Obviously, the current situation is less than ideal. XML and HTML evolved from a common ancestor. XML isn't changing. And the constraint to be as backwards compatible with HTML4 as humanly possible places practical limits on what can be done. Neither being absolutely identical with the XML syntax nor being completely different are options.<br />
<br />
At the present time, the HTML5 syntax is a (near) superset of the XHTML syntax. Yet the situation is (nearly) reversed for the set of DOM trees that can be serialized into XHTML is larger than the set of DOM trees that can be serialized into HTML5.<br />
<br />
Having the syntaxes being substantially similar leads to confusion in some edge cases (e.g., <tt><p/></tt>) but also has some advantages. Similar syntaxes would make things easier for people who have become disillusioned with XHTML and wish to migrate to HTML5. Conversely, similar syntaxes would make incremental migration from HTML5 to XHTML5 easier for those who wish to take advantage of the greater set of DOM trees that can be represented in that syntax.<br />
<br />
=== Known differences ===<br />
<br />
'''Note''': As the current WHATWG document is a draft, this section will need to track to a moving target.<br />
<br />
* A namespace declaration is required in XHTML5, and not allowed in HTML5.<br />
* A <!DOCTYPE> is required in HTML5, optional in XHTML5.<br />
* Empty element syntax is a parse error for non-void elements in HTML5, furthermore the prescribed error recovery for this parse error differs from the XML parsing rules.<br />
* XHTML5 is a XML 1.0 vocabulary. The valid set of unicode characters is limited in XML 1.0, more so than in HTML4/5.<br />
* XML processes are, by design, unforgiving of parse errors.<br />
* You can omit end tags for a number of elements in HTML, but not XML.<br />
* You can use <![CDATA[ ... ]]> syntax of XML, but it means something else in HTML.<br />
* You can use PIs in XML, but not in HTML.<br />
* The set of known named characters entity references can be depended upon in HTML, but not in XHTML (beyond the [http://www.w3.org/TR/REC-xml/#sec-predefined-ent 5 predefined ones]).<br />
* You can omit quotes on attribute values in HTML, but not in XML.<br />
* If you forget to escape "&" or "<" characters in HTML, the error handling is different than in XML.<br />
* You can include non-HTML elements and attributes in XML, but not in HTML.<br />
* <noscript> works in HTML, not XML.<br />
* <iframe> fallback content is parsed as text in HTML, but as markup in XML.<br />
* Comments with "--" in them work in HTML, but fail in XML.<br />
* The DTD internal subset works in XML, but is ignored (or worse) in HTML.<br />
* HTML syntax is case-insensitive, XML syntax is case-sensitive.<br />
* DOM APIs are case-sensitive in XML, case-insensitive in HTML.<br />
* CSS is case-sensitive in XML, case-insensitive in HTML.<br />
* You can use namespace prefixes in XML, not in HTML.<br />
* The contents of <script> and <style> elements in XML are parsed differently than in HTML.<br />
* document.write() works in HTML but not in XML.<br />
* Things that look like XML comments are treated as XML comments in XHTML —even inside script or style elements.<br />
* <tt>meta</tt> tags are not examined for character encoding information.<br />
* <tt>White space</tt> characters in attribute values are [http://www.w3.org/TR/REC-xml/#AVNormalize normalized] to spaces in XHTML.<br />
* xml:lang and xml:base are valid in XHTML, but not in HTML<br />
<br />
=== Potential Strategies ===<br />
<br />
'''Note''': these strategies are not necessarily mutually-exclusive.<br />
<br />
* Develop better tools and actively work to integrate them into products like WordPress and DreamWeaver. (We're doing this already. -Hixie)<br />
* Identify the <tt>xmlns</tt> attribute on the <tt>html</tt> as the definitive marker for XHTML (meaning what, exactly? If this _doesn't_ mean "and therefore use XML parsing rules for those documents", then what does it mean for a document to be XHTML? If it _does_ mean that, then this would cause about 13% of the Web to start showing error pages on new UAs, which would of course cause those UAs to start ignoring the spec since they can't ship a new version that drops a tenth of the Web on the floor -Hixie). Additionally, pick one of the following:<br />
** have it affect the error recovery rules for edge cases like empty non-void elements, and case sensitivity. (Unfortunately this would break about 15% of existing Web pages, causing them to render differently in new UAs compared to old UAs, and therefore causing those UAs to ignore the spec and not do this -Hixie)<br />
** Have the <tt>xmlns</tt> attribute be the *one* non-recoverable error defined by HTML5. (I don't understand what that means -Hixie)</div>Rubyshttps://wiki.whatwg.org/index.php?title=HTML_vs._XHTML&diff=1972HTML vs. XHTML2006-12-03T18:57:33Z<p>Rubys: </p>
<hr />
<div>An often repeated assertion is that XHTML is as different from HTML as RDF/XML is from N3. And that the proper way to tell the two apart is via MIME types.<br />
<br />
There are only two problems with that. XHTML is not as different from HTML as RDF/XML is from N3. And MIME types can't be relied on. Let's take each in turn.<br />
<br />
=== Syntax ===<br />
<br />
* Both N3 and RDF/XML are used to express sets of RDF triples. They are equally capable: every triple store can be dumped into either format. The analogy here is the DOM. It is not currently the case that every DOM tree can be dumped equally capably into either format.<br />
* N3 and RDF/XML are not the same, nor do they even look similar. They are different from top to bottom. Not only are no N3 documents valid RDF/XML, there are no individual triples that can be expressed the same way in both formats.<br />
<br />
=== Mime Types ===<br />
<br />
* People have consistently proven that they can't be trusted to configure and set MIME types correctly. Most aren't even aware that MIME types exist. The default setup with Apache is to not allow overrides. One popular use case is for documentation that is served via <tt>file:///</tt> URIs directly from your hard disk.<br />
* HTTP as specified indicates that the the <tt>Content-Type</tt> header is authoritative - it trumps the XML prolog. HTTP as practiced treats the MIME type as a hint. Whether it be feeds or WMV files, users have an expectation as to what happens when they click on these links, and are unhappy when the browser lets them down.<br />
<br />
=== Ideals ===<br />
<br />
In an ideal word:<br />
* the syntax of XML and HTML would be either complete identical or completely different.<br />
* the set of DOM trees that could be serialized as XHTML and HTML would either be completely identical or completely different.<br />
* <tt>Content-Type</tt> would either aways be respected, or always be ignored.<br />
* there would either be a fool-proof way to "sniff" whether the a given content was HTML or XHTML; or there would be no difference between XHTML and HTML in terms of syntax and range of DOM trees that could validly be serialized would also be identical.<br />
<br />
=== Analysis ===<br />
<br />
Obviously, the current situation is less than ideal. XML and HTML evolved from a common ancestor. XML isn't changing. And the constraint to be as backwards compatible with HTML4 as humanly possible places practical limits on what can be done. Neither being absolutely identical with the XML syntax nor being completely different are options.<br />
<br />
At the present time, the HTML5 syntax is a (near) superset of the XHTML syntax. Yet the situation is (nearly) reversed for the set of DOM trees that can be serialized into XHTML is larger than the set of DOM trees that can be serialized into HTML5.<br />
<br />
Having the syntaxes being substantially similar leads to confusion in some edge cases (e.g., <tt><p/></tt>) but also has some advantages. Similar syntaxes would make things easier for people who have become disillusioned with XHTML and wish to migrate to HTML5. Conversely, similar syntaxes would make incremental migration from HTML5 to XHTML5 easier for those who wish to take advantage of the greater set of DOM trees that can be represented in that syntax.<br />
<br />
=== Known differences ===<br />
<br />
'''Note''': As the current WHATWG document is a draft, this section will need to track to a moving target.<br />
<br />
Syntax:<br />
* xmlns is required in XHTML5, and not allowed in HTML5<br />
* doctype is required in HTML5, optional in XHTML5<br />
* empty element syntax is a parse error for non-void elements in HTML5, furthermore the prescribed error recovery for this parse error differs from the XML parsing rules.<br />
* XHTML5 is a XML 1.0 vocabulary. The valid set of unicode characters is limited in XML 1.0, more so than in HTML4/5.<br />
* XML processes are, by design, unforgiving of parse errors.<br />
<br />
Semantics:<br />
* element names are case insensitive in HTML5, but case sensitive in XHTML5 - this affects both CSS and JavaScript<br />
* Foreign vocabularies, like SVG, have no HTML5 serialization.<br />
<br />
=== Potential Strategies ===<br />
<br />
'''Note''': these strategies are not necessarily mutually-exclusive.<br />
<br />
* Develop better tools and actively work to integrate them into products like WordPress and DreamWeaver.<br />
* Identify the <tt>xmlns</tt> attribute on the <tt>html</tt> as the definitive marker for XHTML. Additionally, pick one of the following:<br />
** have it affect the error recovery rules for edge cases like empty non-void elements, and case sensitivity.<br />
** Have the <tt>xmlns</tt> attribute be the *one* non-recoverable error defined by HTML5.</div>Rubys