Character Encoding Detection

This document is obsolete.

For the current specification, see: http://encoding.spec.whatwg.org/

This page is for documenting the way browsers handle character encoding detection.

Mozilla Observations

  • When there is a BOM, it parses the first 2048 bytes.
    • If a complete meta element is found, that encoding is used.
    • If no meta element is found, UTF-8 is used.
  • When there is no BOM, it parses the document
    • Upon encountering a meta element,
      • If the encoding declared is not compatible with the default (i.e. anything but US-ASCII, ISO-8859-1 or Windows-1252)
      • And if non-US-ASCII characters have been detected
      • And the declared encoding is compatible with what's already been seen
        • The document is re-parsed using that encoding declared.

IE Observations

  • The BOM is authoritative.