A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on IRC (such as one of these permanent autoconfirmed members).

Validator.nu Useful Warning Requests

From WHATWG Wiki
Revision as of 08:34, 4 September 2009 by Lachlan Hunt (talk | contribs) (Added polyglot document checking section)
Jump to: navigation, search

This page documents requests for potential optional checks to be implemented by HTML5 QA tools, like Validator.nu. This is only intended to document feature requests, and may not reflect what is, or will be, implemented in the future.

The following tables describe issues that a QA tool might provide options to warn about. None of the issues listed in these tables are technically conformance errors, but have been requested directly by authors and/or are considered to be useful for authors to be warned about.

Syntactic Issues

Title Description Notes
Quoted attributes Boolean option to require quoted attribute values for all attributes. XHTML-like syntactic convention commonly requested by authors. (request)
Minimised attributes Boolean option to require all boolean attributes to use the non minimised form. e.g. <input disabled="disabled"> instead of <input disabled> XHTML-like syntactic convention commonly requested by authors.
Trailing Slashes Options to either:
  1. Warn about unnecessary trailing slashes in void elements
  2. Require trailing slashes for in void elements
  3. None (default)
Some authors like to follow the XML convention, others prefer to always omit them, and others don't care that much. (request)
Optional </p> ahead of new structural element Boolean option to warn about omitted paragraph end tags ahead of start tags of section, nav, article, aside, header and footer
Optional End Tags Boolean option to require end tags for all non-void elements, which normally have optional end tags (request)
Optional Start Tags Options to require start tags for the elements 'html', 'head', 'body' and 'tbody'. XHTML-like convention, mostly applies to html, head and body. Some authors still choose to omit tbody, but like to always include the others.
Case sensitivity Boolean option to check tag names and attribute names for case sensitivity. HTML and MathML elements and attributes are all lowercase, but SVG contains some camel case names.
Named Entities Options to allow:
  1. The 5 predefined named entity references only (lt, gt, amp, quot, apos)
  2. HTML 4.01 entity references only
  3. XHTML 1.0 entity references only (request)
  4. All named entity references
Warning about HTML4.01 references is a useful check for compatibility reasons, due to existing legacy browsers that don't support the additional entity references imported from MathML yet. Use of only the 5 predefined entity references is needed for those who want XHTML compatibility, without a DOCTYPE.

Other Warnings

Title Description Notes
Untitled document Warn about the use of meaningless or empty titles. e.g. <title>Untitled document<title> (or similar) This is a common default title inserted by authoring tools. Advise the author to use a more appropriate title for the document. (request)
Unnecessary whitespace Warn about long stretches of unnecessary whitespace (request)

Polyglot Document Checking

There are 3 levels of polyglot documents that can be created.

Talismans Only 
An HTML document that contains a number of XML-like syntactic constructs purely as a matter of convention. The document itself may not entirely conform with all well-formedness requirements or may not function properly for other reasons if it were to be treated as XHTML. (This is not really a true polyglot document, but is included here for completeness)
XHTML Compatible 
A valid HTML document that is also fully conforming XHTML. However, the different processing requirements between HTML and XHTML may give slightly different results that would not match in a tree comparison and is not round-trippable.
Strict Polyglot 
A valid HTML document that is also fully conforming XHTML, which would pass a tree comparison of the resulting DOMs (excluding unavoidable differences), and which is fully round-trippable.

Note that these descriptions intentionally ignore differences that could be caused by script and stylesheet processing.

The following is a table of issues that would need to be checked to ensure that a document is a polyglot document.

Any syntactic XML construct which is not valid in HTML is also assumed to be problematic, but are not listed here. For example, the internal subset of a DOCTYPE declaration or the use of CDATA sections within HTML elements. In other words, this table only lists the things that would need to be checked to ensure that a fully conforming HTML document is a polyglot document.

Title Description Notes Polyglot Type Requirement
xmlns and xmlns:prefix attributes. In DOM implementations for XHTML, these attributes are in the http://www.w3.org/2000/xmlns/ namespace. In HTML, these are in no namespace. Unavoidable
The characters "]]>" in content Well-formedness error in XHTML. XHTML-compatible
script and style element Content In HTML, these elements are parsed as CDATA, allowing the use of unescaped special characters. In XHTML, these are parsed as #PCDATA, and any occurrence of the characters < or & must be escaped as &lt; and &amp;, respectively. Scripts and stylesheets containing these characters should be linked externally instead. XHTML-compatible
... (This table is incomplete)