A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Character Encoding Detection

From WHATWG Wiki
Jump to navigation Jump to search

This document is obsolete.

For the current specification, see: http://encoding.spec.whatwg.org/

This page is for documenting the way browsers handle character encoding detection.

Mozilla Observations

  • When there is a BOM, it parses the first 2048 bytes.
    • If a complete meta element is found, that encoding is used.
    • If no meta element is found, UTF-8 is used.
  • When there is no BOM, it parses the document
    • Upon encountering a meta element,
      • If the encoding declared is not compatible with the default (i.e. anything but US-ASCII, ISO-8859-1 or Windows-1252)
      • And if non-US-ASCII characters have been detected
      • And the declared encoding is compatible with what's already been seen
        • The document is re-parsed using that encoding declared.

IE Observations

  • The BOM is authoritative.