A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Change Proposal for ISSUE-150

From WHATWG Wiki
Revision as of 21:55, 6 February 2011 by Hixie (talk | contribs)
Jump to navigation Jump to search


Do not micromanage editors on editorial details.


Pushing for details such as how to write down Unicode character names only makes sense if all W3C specifications were using the same notation. As they are not using the same notation this problem is greater than HTML5 and cannot be solved by this Working Group micromanaging its editors.

With any edit of this scale, errors are made. Given the sensitive nature of the particular topic at hand, these errors could be especially bad, ranging for interoperability errors (where different browsers implement different rules because they were done before or after the edit), to validators giving bad advice (because of inconsistencies between sections giving authoring conformance criteria and sections giving implementation requirements), to introducing oddities into the platform (e.g. using an unusual character for separating a list).

Listing the code point, the character name, and including the character itself is the least ambiguous way of referring to Unicode characters and thus the way most likely to lead to good interoperability. Omitting the character name would lead to readers unfamiliar with Unicode codepoints confusing characters such as the U+0031 DIGIT ONE character (1), the U+0049 LATIN CAPITAL LETTER I character, the U+006C LATIN SMALL LETTER L character, and the U+007C VERTICAL LINE character (|). Omitting the code points would lead to implementors making mistakes when looking up the code points (which are often needed to implement the requirements). Omitting the character would lead to worse readability for people for whom English is not their first language, or who are are not familiar with the names of punctuation characters.

Listing the code point, the character name, and including the character itself provides redundancy that reviewers can use to check for unintentional mistakes. Omitting one or two of these leads to a higher likelihood that errors will be missed. (The redundancy has already caught numerous problems like this, in fact.)

Omitting the character name can lead to extra confusion when control characters and characters with no glyphs are shown next to characters with glyphs. For example, "U+000A (LF), U+0020 ( ), and U+0031 (1)" is much more confusing than "U+000A LINE FEED (LF) character, U+0020 SPACE character, and U+0031 DIGIT ONE character (1)" to someone unfamiliar with code points. Listing the character name as well makes such lists much easier to skim and understand.

Just listing the code point and the character is atheistically displeasing in lists, especially when including characters for which no glyph can be shown (e.g. U+000A LF). Including the name only for control characters is leads to a very unbalanced feel for the specification text. Thus, all three need to be included for the text to be pleasant to read in lists. Being consistent throughout (in lists and outside lists) leads to fewer editing mistakes, which is important in achieving the goal of interoperability.


No change.


Editors will not feel micromanaged. W3C consistency can still be pursued at a higher level and applied retroactively later. Interoperability is not risked.