A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
Validator.nu Useful Warning Requests: Difference between revisions
(Add Tantek's request for XHTML 1.0 entities) |
No edit summary |
||
(16 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
{{Obsolete}} | |||
This page documents requests for potential optional checks to be implemented by HTML5 QA tools, like Validator.nu. This is only intended to document feature requests, and may not reflect what is, or will be, implemented in the future. | This page documents requests for potential optional checks to be implemented by HTML5 QA tools, like Validator.nu. This is only intended to document feature requests, and may not reflect what is, or will be, implemented in the future. | ||
Line 24: | Line 25: | ||
# None (default) | # None (default) | ||
| Some authors like to follow the XML convention, others prefer to always omit them, and others don't care that much. ([http://www.zeldman.com/superfriends/guide/#validation request]) | | Some authors like to follow the XML convention, others prefer to always omit them, and others don't care that much. ([http://www.zeldman.com/superfriends/guide/#validation request]) | ||
|- | |||
! Optional </p> ahead of new structural element | |||
| Boolean option to warn about omitted paragraph end tags ahead of start tags of section, nav, article, aside, header and footer | |||
| | |||
|- | |- | ||
! Optional End Tags | ! Optional End Tags | ||
Line 33: | Line 38: | ||
| XHTML-like convention, mostly applies to html, head and body. Some authors still choose to omit tbody, but like to always include the others. | | XHTML-like convention, mostly applies to html, head and body. Some authors still choose to omit tbody, but like to always include the others. | ||
|- | |- | ||
! | ! Case sensitivity | ||
| Boolean option to check tag names and attribute names for case sensitivity. | |||
| HTML and MathML elements and attributes are all lowercase, but SVG contains some camel case names. | |||
|- | |||
! Character Entity References | |||
| Options to allow: | | Options to allow: | ||
# The 5 predefined named entity references only (lt, gt, amp, quot, apos) | # The 5 predefined named entity references only (lt, gt, amp, quot, apos) | ||
Line 57: | Line 66: | ||
| Warn about long stretches of unnecessary whitespace | | Warn about long stretches of unnecessary whitespace | ||
| ([http://krijnhoetmer.nl/irc-logs/whatwg/20090903#l-132 request]) | | ([http://krijnhoetmer.nl/irc-logs/whatwg/20090903#l-132 request]) | ||
|} | |||
== Polyglot Document Checking == | |||
There are 3 levels of polyglot documents that can be created. | |||
;Talismans Only :An HTML document that contains a number of XML-like syntactic constructs purely as a matter of convention. The document itself may not entirely conform with all well-formedness requirements or may not function properly for other reasons if it were to be treated as XHTML. (This is not really a true polyglot document, but is included here for completeness) | |||
;XHTML Compatible :A valid HTML document that is also fully conforming XHTML. However, the different processing requirements between HTML and XHTML may give slightly different results that would not match in a tree comparison and is not round-trippable. | |||
;Strict Polyglot :A valid HTML document that is also fully conforming XHTML, which would pass a tree comparison of the resulting DOMs (excluding unavoidable differences), and which is fully round-trippable. | |||
Note that these descriptions intentionally ignore differences that could be caused by script and stylesheet processing. | |||
The following is a table of issues that would need to be checked to ensure that a given, conforming HTML document is a polyglot document. This does not list all issues that would need to be checked to ensure that a given, conforming XHTML document is a polyglot document, however. As such, syntactic XML constructs which are not valid in HTML are not listed here. For example, the internal subset of a DOCTYPE declaration or the use of CDATA sections within HTML elements. | |||
{| class="wikitable" border="1" | |||
|- | |||
! Title | |||
! Description | |||
! Notes | |||
! Polyglot Level Requirement | |||
|- | |||
! The <code>xmlns</code> and <code>xmlns:<var>prefix</var></code> attributes. | |||
| The <code>html</code> root element needs to have an <code>xmlns="http://www.w3.org/1999/xhtml"</code> attribute. The <code>svg</code> and <code>math</code> elements need to declare the appropriate namespaces for SVG and MathML, respectively, and, if used within the document, XLink. | |||
| In DOM implementations for XHTML, these attributes are in the <code>http://www.w3.org/2000/xmlns/</code> namespace. For HTML, these are in no namespace. This issue is '''unavoidable'''. | |||
| XHTML-compatible (and SVG+MathML) | |||
|- | |||
! The <code>xml:lang</code> attribute | |||
| Meaningless talisman in HTML. | |||
| In DOM implementations for XHTML, <code>xml:lang</code> represents the lang attribute in the XML namespace. For HTML, it represents an attribute in no namespace with the literal localname "xml:lang". This issue is '''unavoidable'''. | |||
| | |||
|- | |||
! Case sensitivity | |||
| All tag names and attribute names for HTML elements must be written in lowercase. Tag names for SVG and MathML must be written in the case defined by those language specifications. The DOCTYPE declaration is case sensitive in XHTML. Must be <code><!DOCTYPE html></code> (or the legacy-compat version). | |||
| | |||
| XHTML-Compatible (and SVG+MathML) | |||
|- | |||
! Default Character Encoding | |||
| The default character encoding for <code>text/html</code> is effectively dependent upon the end-user's locale (Windows-1252 for most Western locales). For <code>application/xhtml+xml</code> (and other XML types), it is UTF-8. | |||
| Use UTF-8 and declare this using either: <code><meta charset="UTF-8"></code>, ensure the encoding is declared in the transport layer (HTTP Content-Type header), or use UTF-8 or UTF-16 with an appropriate Byte Order Mark. | |||
| XHTML-compatible | |||
|- | |||
! Character Entity References | |||
| Without a DTD, only the 5 predefined entity references in XML may be used. | |||
| The additional entity references defined and supported using the XHTML 1.0 or 1.1 DOCTYPE cannot be used in XHTML. | |||
| XHTML-compatible | |||
|- | |||
! Unescaped Special Chars | |||
| Unescaped ampersand (<code>&</code>) or less-than (<code><</code>) characters are not allowed in text and attribute values, within XHTML. (This doesn't include within CDATA sections, comments, etc.) | |||
| HTML can include unescaped ampersands where they are unambiguous, and unescaped less than characters within rawtext or RCDATA content, and quoted attributes. | |||
| XHTML-compatible | |||
|- | |||
! The characters "<code>]]></code>" in content | |||
| The use of this sequence of characters is a well-formedness error in XHTML. | |||
| | |||
| XHTML-compatible | |||
|- | |||
! Line feeds after start tags | |||
| A line feed (LF) after the start tags for <code><pre></code> or <code><textarea></code> (or the non-conforming <code><listing></code> element) is ignored in HTML. | |||
| Avoid using line feeds after these start tags | |||
| Strict Polyglot | |||
|- | |||
! Quoted attributes | |||
| Unquoted attributes are not allowed in XHTML. | |||
| | |||
| XHTML-compatible | |||
|- | |||
! Minimised attributes | |||
| Minimised attributes (attributes without a value) cannot be used in XML. | |||
| Used expanded form. e.g. <code>disabled=""</code> or <code>disabled="disabled"</code> | |||
| XHTML-compatible | |||
|- | |||
! Trailing Slashes | |||
| Void elements require explicit closing with trailing slashes. | |||
| | |||
| XHTML-compatible | |||
|- | |||
! Optional start- and end-tags | |||
| Neither start- nor end-tags may be omitted. | |||
| | |||
| XHTML-compatible | |||
|- | |||
! The <code>script</code> and <code>style</code> Elements | |||
| In HTML, these elements are parsed as CDATA, allowing the use of unescaped special characters. In XHTML, these are parsed as #PCDATA, and any occurrence of the characters < or & must be escaped as <code>&lt;</code> and <code>&amp;</code>, respectively. | |||
| Scripts and stylesheets containing these characters should be linked externally instead. | |||
| XHTML-compatible | |||
|- | |||
! The <code>textarea</code> and <code>title</code> Elements | |||
| The content model of these elements is RCDATA in HTML, but PCDATA in XHTML. These elements may not contain escaping text spans (<code><!-- ... --></code>, because they look like comments), or unescaped ampersands or less than characters. | |||
| | |||
| XHTML-compatible | |||
|- | |||
! The <code>tbody</code> Element | |||
| In HTML, the <code>tbody</code> element will be implied automatically, but in XHTML, it is optional. | |||
| Explicitly include the <code>tbody</code> element. | |||
| Strict Polyglot | |||
|- | |||
! The <code>noscript</code> Element | |||
| The <code>noscript</code> Element is forbidden in XHTML. | |||
| | |||
| XHTML-compatible | |||
|- | |||
! The <code>iframe</code> Element Content | |||
| The <code>iframe</code> Element must be empty in XHTML documents. | |||
| | |||
| XHTML-compatible | |||
|} | |} |
Latest revision as of 14:19, 17 March 2015
This document is obsolete.
This page documents requests for potential optional checks to be implemented by HTML5 QA tools, like Validator.nu. This is only intended to document feature requests, and may not reflect what is, or will be, implemented in the future.
The following tables describe issues that a QA tool might provide options to warn about. None of the issues listed in these tables are technically conformance errors, but have been requested directly by authors and/or are considered to be useful for authors to be warned about.
Syntactic Issues
Title | Description | Notes |
---|---|---|
Quoted attributes | Boolean option to require quoted attribute values for all attributes. | XHTML-like syntactic convention commonly requested by authors. (request) |
Minimised attributes | Boolean option to require all boolean attributes to use the non minimised form. e.g. <input disabled="disabled"> instead of <input disabled> | XHTML-like syntactic convention commonly requested by authors. |
Trailing Slashes | Options to either:
|
Some authors like to follow the XML convention, others prefer to always omit them, and others don't care that much. (request) |
Optional </p> ahead of new structural element | Boolean option to warn about omitted paragraph end tags ahead of start tags of section, nav, article, aside, header and footer | |
Optional End Tags | Boolean option to require end tags for all non-void elements, which normally have optional end tags | (request) |
Optional Start Tags | Options to require start tags for the elements 'html', 'head', 'body' and 'tbody'. | XHTML-like convention, mostly applies to html, head and body. Some authors still choose to omit tbody, but like to always include the others. |
Case sensitivity | Boolean option to check tag names and attribute names for case sensitivity. | HTML and MathML elements and attributes are all lowercase, but SVG contains some camel case names. |
Character Entity References | Options to allow:
|
Warning about HTML4.01 references is a useful check for compatibility reasons, due to existing legacy browsers that don't support the additional entity references imported from MathML yet. Use of only the 5 predefined entity references is needed for those who want XHTML compatibility, without a DOCTYPE. |
Other Warnings
Title | Description | Notes |
---|---|---|
Untitled document | Warn about the use of meaningless or empty titles. e.g. <title>Untitled document<title> (or similar) | This is a common default title inserted by authoring tools. Advise the author to use a more appropriate title for the document. (request) |
Unnecessary whitespace | Warn about long stretches of unnecessary whitespace | (request) |
Polyglot Document Checking
There are 3 levels of polyglot documents that can be created.
- Talismans Only
- An HTML document that contains a number of XML-like syntactic constructs purely as a matter of convention. The document itself may not entirely conform with all well-formedness requirements or may not function properly for other reasons if it were to be treated as XHTML. (This is not really a true polyglot document, but is included here for completeness)
- XHTML Compatible
- A valid HTML document that is also fully conforming XHTML. However, the different processing requirements between HTML and XHTML may give slightly different results that would not match in a tree comparison and is not round-trippable.
- Strict Polyglot
- A valid HTML document that is also fully conforming XHTML, which would pass a tree comparison of the resulting DOMs (excluding unavoidable differences), and which is fully round-trippable.
Note that these descriptions intentionally ignore differences that could be caused by script and stylesheet processing.
The following is a table of issues that would need to be checked to ensure that a given, conforming HTML document is a polyglot document. This does not list all issues that would need to be checked to ensure that a given, conforming XHTML document is a polyglot document, however. As such, syntactic XML constructs which are not valid in HTML are not listed here. For example, the internal subset of a DOCTYPE declaration or the use of CDATA sections within HTML elements.
Title | Description | Notes | Polyglot Level Requirement |
---|---|---|---|
The xmlns and xmlns:prefix attributes.
|
The html root element needs to have an xmlns="http://www.w3.org/1999/xhtml" attribute. The svg and math elements need to declare the appropriate namespaces for SVG and MathML, respectively, and, if used within the document, XLink.
|
In DOM implementations for XHTML, these attributes are in the http://www.w3.org/2000/xmlns/ namespace. For HTML, these are in no namespace. This issue is unavoidable.
|
XHTML-compatible (and SVG+MathML) |
The xml:lang attribute
|
Meaningless talisman in HTML. | In DOM implementations for XHTML, xml:lang represents the lang attribute in the XML namespace. For HTML, it represents an attribute in no namespace with the literal localname "xml:lang". This issue is unavoidable.
|
|
Case sensitivity | All tag names and attribute names for HTML elements must be written in lowercase. Tag names for SVG and MathML must be written in the case defined by those language specifications. The DOCTYPE declaration is case sensitive in XHTML. Must be <!DOCTYPE html> (or the legacy-compat version).
|
XHTML-Compatible (and SVG+MathML) | |
Default Character Encoding | The default character encoding for text/html is effectively dependent upon the end-user's locale (Windows-1252 for most Western locales). For application/xhtml+xml (and other XML types), it is UTF-8.
|
Use UTF-8 and declare this using either: <meta charset="UTF-8"> , ensure the encoding is declared in the transport layer (HTTP Content-Type header), or use UTF-8 or UTF-16 with an appropriate Byte Order Mark.
|
XHTML-compatible |
Character Entity References | Without a DTD, only the 5 predefined entity references in XML may be used. | The additional entity references defined and supported using the XHTML 1.0 or 1.1 DOCTYPE cannot be used in XHTML. | XHTML-compatible |
Unescaped Special Chars | Unescaped ampersand (& ) or less-than (< ) characters are not allowed in text and attribute values, within XHTML. (This doesn't include within CDATA sections, comments, etc.)
|
HTML can include unescaped ampersands where they are unambiguous, and unescaped less than characters within rawtext or RCDATA content, and quoted attributes. | XHTML-compatible |
The characters "]]> " in content
|
The use of this sequence of characters is a well-formedness error in XHTML. | XHTML-compatible | |
Line feeds after start tags | A line feed (LF) after the start tags for <pre> or <textarea> (or the non-conforming <listing> element) is ignored in HTML.
|
Avoid using line feeds after these start tags | Strict Polyglot |
Quoted attributes | Unquoted attributes are not allowed in XHTML. | XHTML-compatible | |
Minimised attributes | Minimised attributes (attributes without a value) cannot be used in XML. | Used expanded form. e.g. disabled="" or disabled="disabled"
|
XHTML-compatible |
Trailing Slashes | Void elements require explicit closing with trailing slashes. | XHTML-compatible | |
Optional start- and end-tags | Neither start- nor end-tags may be omitted. | XHTML-compatible | |
The script and style Elements
|
In HTML, these elements are parsed as CDATA, allowing the use of unescaped special characters. In XHTML, these are parsed as #PCDATA, and any occurrence of the characters < or & must be escaped as < and & , respectively.
|
Scripts and stylesheets containing these characters should be linked externally instead. | XHTML-compatible |
The textarea and title Elements
|
The content model of these elements is RCDATA in HTML, but PCDATA in XHTML. These elements may not contain escaping text spans (<!-- ... --> , because they look like comments), or unescaped ampersands or less than characters.
|
XHTML-compatible | |
The tbody Element
|
In HTML, the tbody element will be implied automatically, but in XHTML, it is optional.
|
Explicitly include the tbody element.
|
Strict Polyglot |
The noscript Element
|
The noscript Element is forbidden in XHTML.
|
XHTML-compatible | |
The iframe Element Content
|
The iframe Element must be empty in XHTML documents.
|
XHTML-compatible |