A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
Encoding
This page tracks notes related to the Encoding Standard. See Web Encodings for some historical data with respect to encodings and their labels.
Implementations
http://code.google.com/p/stringencoding/ implements the standard in JavaScript
Legacy implementations
- Gecko
- http://mxr.mozilla.org/mozilla-central/source/intl/uconv/
- Chromium
- http://src.chromium.org/svn/trunk/deps/third_party/icu46/README.chromium
- http://src.chromium.org/svn/trunk/deps/third_party/icu46/source/data/mappings/convrtrs.txt
- http://src.chromium.org/svn/trunk/deps/third_party/icu46/source/data/mappings/ucmlocal.mk
Japanese encodings
Got these links after the standard was written:
- http://www8.plala.or.jp/tkubota1/unicode-symbols.html
- http://www8.plala.or.jp/tkubota1/unicode-symbols-map2.html
Sniffing
- http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html
- http://mxr.mozilla.org/mozilla-central/source/extensions/universalchardet/src/base/ (Gecko; tests are a few directories up)
- http://code.google.com/p/juniversalchardet/ (Gecko's detector in Java)
- https://bugzilla.mozilla.org/show_bug.cgi?id=631751 (UTF-16 sniffing)
- http://trac.webkit.org/browser/trunk/Source/WebCore/loader/TextResourceDecoder.cpp (WebKit; is this all?)
Misc
- XSS vulnerabilities with unusual character encodings
- Test multiple BOMs
Labels
Labels in Opera that are not in the spec: '866', 'ansi_x3.4-1986', 'asmo-708', 'cn-gb', 'cp1250', 'cp1251', 'cp1252', 'cp1254', 'cp1257', 'cp367', 'cp50220', 'cp51932', 'cp819', 'cp932', 'cp936', 'csascii', 'cscp50220', 'cscp51932', 'csibm866', 'csinvariant', 'csiso646basic1983', 'csiso88596e', 'csiso88596i', 'iso-8859-6-i', 'csiso88598e', 'csiso88598i', 'cskoi8r', 'csunicode', 'csunicode11', 'csunicode11utf7', 'utf-7', 'csunicodeascii', 'csunicodejapanese', 'csunicodelatin1', 'csviscii', 'viscii', 'cswindows31j', 'euc-cn', 'euc-tw', 'euc-tw', 'extended_unix_code_packed_format_for_japanese', 'ibm367', 'ibm819', 'invariant', 'iso-10646', 'iso-10646-j-1', 'iso-10646-ucs-2', 'iso-10646-ucs-basic', 'iso-10646-unicode-latin1', 'iso-2022-cn', 'iso-2022-cn', 'iso-2022-jp-1', 'iso-2022-jp-1', 'iso-8859-6-e', 'iso-8859-6-i', 'iso-8859-6-i', 'iso-8859-8-e', 'iso-celtic', 'iso-ir-100', 'iso-ir-199', 'iso-ir-226', 'iso-ir-6', 'iso646-us', 'iso8859-11', 'iso8859-12', 'iso-8859-12', 'iso8859-13', 'iso8859-15', 'iso8859-16', 'iso8859-3', 'iso8859-4', 'iso8859-5', 'iso8859-6', 'iso8859-7', 'iso8859-8', 'iso8859-9', 'iso88591', 'iso885910', 'iso885911', 'iso885912', 'iso-8859-12', 'iso885913', 'iso885914', 'iso885915', 'iso885916', 'iso88592', 'iso88593', 'iso88594', 'iso88595', 'iso88596', 'iso88597', 'iso88598', 'iso88599', 'iso_646.basic:1983', 'iso_646.irv:1991', 'iso_8859-10:1992', 'iso_8859-14', 'iso_8859-14:1998', 'iso_8859-16', 'iso_8859-16:2001', 'iso_8859-1:1987', 'iso_8859-2:1987', 'iso_8859-3:1988', 'iso_8859-4:1988', 'iso_8859-5:1988', 'iso_8859-6-e', 'iso_8859-6-i', 'iso-8859-6-i', 'iso_8859-6:1987', 'iso_8859-7:1987', 'iso_8859-8-e', 'iso_8859-8-i', 'iso_8859-8:1988', 'iso_8859-9', 'iso_8859-9:1989', 'l10', 'l8', 'latin-9', 'latin10', 'latin8', 'microsoft-cp1250', 'microsoft-cp1251', 'microsoft-cp1252', 'microsoft-cp1253', 'microsoft-cp1254', 'microsoft-cp1255', 'microsoft-cp1256', 'microsoft-cp1257', 'microsoft-cp1258', 'ms932', 'ms936', 'ref', 'tis-620-2533', 'unicode-1-1', 'unicode-1-1-utf-7', 'utf-7', 'us', 'utf-7', 'utf-7', 'viscii', 'viscii', 'windows-936', 'x-cp1252', 'x-cp1253', 'x-cp1254', 'x-cp1255', 'x-cp1256', 'x-cp1257', 'x-cp1258', 'x-mac-ce', 'x-mac-ce', 'x-mac-greek', 'x-mac-turkish'