Character Encoding Detection: Difference between revisions

Latest revision as of 13:49, 9 July 2013

This document is obsolete.

For the current specification, see: http://encoding.spec.whatwg.org/

This page is for documenting the way browsers handle character encoding detection.

When there is a BOM, it parses the first 2048 bytes.
- If a complete meta element is found, that encoding is used.
- If no meta element is found, UTF-8 is used.
When there is no BOM, it parses the document
- Upon encountering a meta element,
  - If the encoding declared is not compatible with the default (i.e. anything but US-ASCII, ISO-8859-1 or Windows-1252)
  - And if non-US-ASCII characters have been detected
  - And the declared encoding is compatible with what's already been seen
    - The document is re-parsed using that encoding declared.

Revision as of 10:38, 5 December 2006 (view source) Lachlan Hunt (talk \| contribs) (Added some Mozilla and IE observations)		Latest revision as of 13:49, 9 July 2013 (view source) GPHemsley (talk \| contribs) (+{{obsolete\|spec=http://encoding.spec.whatwg.org/}})
Line 1:		Line 1:
			{{obsolete\|spec=http://encoding.spec.whatwg.org/}}
	This page is for documenting the way browsers handle character encoding detection.		This page is for documenting the way browsers handle character encoding detection.