Character Encoding Detection
From WHATWG Wiki
This page is for documenting the way browsers handle character encoding detection.
Mozilla Observations
- When there is a BOM, it parses the first 2048 bytes.
- If a complete meta element is found, that encoding is used.
- If no meta element is found, UTF-8 is used.
- When there is no BOM, it parses the document
- Upon encountering a meta element,
- If the encoding declared is not compatible with the default (i.e. anything but US-ASCII, ISO-8859-1 or Windows-1252)
- And if non-US-ASCII characters have been detected
- And the declared encoding is compatible with what's already been seen
- The document is re-parsed using that encoding declared.
- Upon encountering a meta element,
IE Observations
- The BOM is authoritative.
