A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on IRC (such as one of these permanent autoconfirmed members) or send an e-mail to admin@wiki.whatwg.org with your desired username and an explanation of the first edit you'd like to make. (Do not use this e-mail address for any other inquiries, as they will be ignored or politely declined.)

Character Encoding Detection

From WHATWG Wiki
Jump to: navigation, search

This document is obsolete.

For the current specification, see: http://encoding.spec.whatwg.org/

This page is for documenting the way browsers handle character encoding detection.

Mozilla Observations

  • When there is a BOM, it parses the first 2048 bytes.
    • If a complete meta element is found, that encoding is used.
    • If no meta element is found, UTF-8 is used.
  • When there is no BOM, it parses the document
    • Upon encountering a meta element,
      • If the encoding declared is not compatible with the default (i.e. anything but US-ASCII, ISO-8859-1 or Windows-1252)
      • And if non-US-ASCII characters have been detected
      • And the declared encoding is compatible with what's already been seen
        • The document is re-parsed using that encoding declared.

IE Observations

  • The BOM is authoritative.