Character Encoding Detection

This document is obsolete.

For the current specification, see: http://encoding.spec.whatwg.org/

This page is for documenting the way browsers handle character encoding detection.

Mozilla Observations

When there is a BOM, it parses the first 2048 bytes.
- If a complete meta element is found, that encoding is used.
- If no meta element is found, UTF-8 is used.
When there is no BOM, it parses the document
- Upon encountering a meta element,
  - If the encoding declared is not compatible with the default (i.e. anything but US-ASCII, ISO-8859-1 or Windows-1252)
  - And if non-US-ASCII characters have been detected
  - And the declared encoding is compatible with what's already been seen
    - The document is re-parsed using that encoding declared.