Encoding: Difference between revisions

Revision as of 20:55, 21 May 2012

This page tracks notes related to the Encoding Standard. See Web Encodings for some historical data with respect to encodings and their labels.

Implementations

http://code.google.com/p/stringencoding/ implements the standard in JavaScript
http://darcsden.com/and/flex-encoding is an experimental (f)lex implementation of the decoding algorithms

Legacy implementations

Gecko: http://mxr.mozilla.org/mozilla-central/source/intl/uconv/
Chromium: http://src.chromium.org/svn/trunk/deps/third_party/icu46/README.chromium; http://src.chromium.org/svn/trunk/deps/third_party/icu46/source/data/mappings/convrtrs.txt; http://src.chromium.org/svn/trunk/deps/third_party/icu46/source/data/mappings/ucmlocal.mk

Japanese encodings

Got these links after the standard was written:

Sniffing

http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html
http://mxr.mozilla.org/mozilla-central/source/extensions/universalchardet/src/base/ (Gecko; tests are a few directories up)
http://code.google.com/p/juniversalchardet/ (Gecko's detector in Java)
https://bugzilla.mozilla.org/show_bug.cgi?id=631751 (UTF-16 sniffing in Gecko)
http://trac.webkit.org/browser/trunk/Source/WebCore/loader/TextResourceDecoder.cpp (WebKit; is this all?)

Gecko notes

Thai detection was not enabled before June 2009
Encodings not in the spec: iso-2022-cn, euc-tw
GB2312 is supported, but superset gbk/gb18030 is not
UTF-16 sniffing is HTML specific

Misc

XSS vulnerabilities with unusual character encodings
Test multiple BOMs

Labels

Labels in Opera that are not in the spec: 'ansi_x3.4-1986', 'cn-gb', 'cp367', 'cp50220', 'cp51932', 'cp932', 'cp936', 'csascii', 'cscp50220', 'cscp51932', 'csinvariant', 'csiso646basic1983', 'csunicode', 'csunicode11', 'csunicode11utf7', 'csunicodeascii', 'csunicodejapanese', 'csunicodelatin1', 'csviscii', 'cswindows31j', 'euc-cn', 'euc-tw', 'extended_unix_code_packed_format_for_japanese', 'ibm367', 'invariant', 'iso-10646', 'iso-10646-j-1', 'iso-10646-ucs-2', 'iso-10646-ucs-basic', 'iso-10646-unicode-latin1', 'iso-2022-cn', 'iso-2022-jp-1', 'iso-celtic', 'iso-ir-199', 'iso-ir-226', 'iso-ir-6', 'iso646-us', 'iso8859-16', 'iso885916', 'iso_646.basic:1983', 'iso_646.irv:1991', 'iso_8859-10:1992', 'iso_8859-14', 'iso_8859-14:1998', 'iso_8859-16', 'iso_8859-16:2001', 'iso_8859-6-e', 'iso_8859-6-i', 'iso_8859-8-e', 'iso_8859-8-i', 'l10', 'l8', 'latin-9', 'latin10', 'latin8', 'microsoft-cp1250', 'microsoft-cp1251', 'microsoft-cp1252', 'microsoft-cp1253', 'microsoft-cp1254', 'microsoft-cp1255', 'microsoft-cp1256', 'microsoft-cp1257', 'microsoft-cp1258', 'ms932', 'ms936', 'ref', 'tis-620-2533', 'unicode-1-1', 'unicode-1-1-utf-7', 'us', 'utf-7', 'viscii', 'windows-936', 'x-mac-ce', 'x-mac-greek', 'x-mac-turkish', 'x-user-defined'

@@ Line 4: / Line 4: @@
 * http://code.google.com/p/stringencoding/ implements the standard in JavaScript
-* http://patch-tag.com/r/and/flex-encoding/home is an experimental (f)lex implementation of the decoding algorithms
+* http://darcsden.com/and/flex-encoding is an experimental (f)lex implementation of the decoding algorithms
 ==Legacy implementations==

Encoding: Difference between revisions

Revision as of 20:55, 21 May 2012

Contents

Implementations

Legacy implementations

Japanese encodings

Sniffing

Gecko notes

Misc

Labels

Navigation menu

Encoding: Difference between revisions

Revision as of 20:55, 21 May 2012

Implementations

Legacy implementations

Japanese encodings

Sniffing

Gecko notes

Misc

Labels

Navigation menu

Search