A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Difference between revisions of "Web Encodings"

From WHATWG Wiki
Jump to navigation Jump to search
(→‎Firefox: Added table (no mappings between encodings))
Line 321: Line 321:
=== Firefox ===
=== Firefox ===


FIXME
{|border=1 cellpadding=4 cellspacing=0
!| Encoding
!| Aliases
!| Decoded As
!| Notes
|-
|-
| armscii-8
| armscii-8
|
|
|-
| Big5
| big5, csbig5, x-x-big5, zh_tw-big5
|
|
|-
| Big5-HKSCS
| big5-hkscs
|
|
|-
| EUC-JP
| cseucjpkdfmtjapanese, euc-jp, x-euc-jp
|
|
|-
| EUC-KR
| 5601, csksc56011987, csueckr, euc-kr, iso-ir-149, korean, ks_c_5601-1989, ksc5601, ksc_5601
|
|
|-
| gb18030
| gb18030
|
|
|-
| GB2312
| chinese, csgb2312, csiso58gb231280, gb2312, gb_2312, gb_2312-80, iso-ir-58, zh_cn.euc
|
|
|-
| GEOSTD8
| geostd8
|
|
|-
| HZ-GB-2312
| hz-gb-2312
|
|
|-
| IBM850
| 850, cp850, csIBM850, ibm850
|
|
|-
| IBM852
| 852, cp852, csIBM852, ibm852
|
|
|-
| IBM855
| 855, cp855, csIBM855, ibm855
|
|
|-
| IBM857
| 857, cp857, csIBM857, ibm857
|
|
|-
| IBM862
| 862, cp862, csIBM862, ibm862
|
|
|-
| IBM864
| 864, cp864, csIBM864, ibm-864, ibm864
|
|
|-
| IBM864i
| 864i, cp864i, csibm864i, ibm-864i, ibm864i
|
|
|-
| IBM866
| 866, cp-866, cp866, csIBM866, ibm866
|
|
|-
| ISO-2022-CN
| iso-2022-cn, iso-2022-cn-ext
|
|
|-
| ISO-2022-JP
| csiso2022jp, csiso2022jp2, iso-2022-jp, iso-2022-jp-2
|
|
|-
| ISO-2022-KR
| csiso2022kr, iso-2022-kr
|
|
|-
| ISO-8859-1
| cp819, csisolatin1, ibm819, iso-8859-1, iso-ir-100, iso8859-1, iso88591, iso_8859-1, l1, latin1
|
|
|-
| ISO-8859-10
| csisolatin6, iso-8859-10, iso-ir-157, iso8859-10, iso885910, l6, latin6
|
|
|-
| ISO-8859-11
| iso-8859-11, iso8859-11, iso885911
|
|
|-
| ISO-8859-12
| iso885912
|
|
|-
| ISO-8859-13
| iso-8859-13, iso8859-13, iso885913
|
|
|-
| ISO-8859-14
| iso-8859-14, iso8859-14, iso885914
|
|
|-
| ISO-8859-15
| iso-8859-15, iso8859-15, iso885915, iso_8859-15
|
|
|-
| ISO-8859-16
| iso-8859-16
|
|
|-
| ISO-8859-2
| csisolatin2, iso-8859-2, iso-ir-101, iso8859-2, iso88592, iso_8859-2, l2, latin2
|
|
|-
| ISO-8859-3
| csisolatin3, iso-8859-3, iso-ir-109, iso8859-3, iso88593, iso_8859-3, l3, latin3
|
|
|-
| ISO-8859-4
| csisolatin4, iso-8859-4, iso-ir-110, iso8859-4, iso88594, iso_8859-4, l4, latin4
|
|
|-
| ISO-8859-5
| csisolatincyrillic, cyrillic, iso-8859-5, iso-ir-144, iso8859-5, iso88595, iso_8859-5
|
|
|-
| ISO-8859-6
| arabic, asmo-708, csisolatinarabic, ecma-114, iso-8859-6, iso-ir-127, iso8859-6, iso88596, iso_8859-6
|
|
|-
| ISO-8859-6-E
| csiso88596e, iso-8859-6-e
|
|
|-
| ISO-8859-6-I
| csiso88596i, iso-8859-6-i
|
|
|-
| ISO-8859-7
| csisolatingreek, ecma-118, elot_928, greek, greek8, iso-8859-7, iso-ir-126, iso8859-7, iso88597, iso_8859-7, sun_eu_greek
|
|
|-
| ISO-8859-8
| csisolatinhebrew, hebrew, iso-8859-8, iso-ir-138, iso8859-8, iso88598, iso_8859-8, visual
|
|
|-
| ISO-8859-8-E
| csiso88598e, iso-8859-8-e
|
|
|-
| ISO-8859-8-I
| csiso88598i, iso-8859-8-i, iso-8859-8i
|
|
|-
| ISO-8859-9
| csisolatin5, iso-8859-9, iso-ir-148, iso8859-9, iso88599, iso_8859-9, l5, latin5
|
|
|-
| ISO-IR-111
| csiso111ecmacyrillic, ecma-cyrillic, iso-ir-111
|
|
|-
| KOI8-R
| koi8-r
|
|
|-
| KOI8-U
| koi8-u
|
|
|-
| Shift_JIS
| csshiftjis, ms_kanji, shift-jis, shift_jis, windows-31j, x-sjis
|
|
|-
| T.61-8bit
| csiso103t618bit, iso-ir-103, t.61, t.61-8bit
|
|
|-
| TIS-620
| tis-620, tis620
|
|
|-
| us-ascii
| 646, ansi_x3.4-1968, ascii, us-ascii
|
|
|-
| UTF-16
| utf-16
|
|
|-
| UTF-16BE
| csunicode, csunicode11, csunicodeascii, csunicodelatin1, iso-10646, iso-10646-j-1, iso-10646-ucs-2, iso-10646-ucs-basic, iso-10646-unicode-latin1, utf-16be, x-iso-10646-ucs-2-be
|
|
|-
| UTF-16LE
| utf-16le, x-iso-10646-ucs-2-le
|
|
|-
| UTF-32BE
| iso-10646-ucs-4, utf-32be, x-iso-10646-ucs-4-be
|
|
|-
| UTF-32LE
| utf-32le, x-iso-10646-ucs-4-le
|
|
|-
| UTF-7
| csunicode11utf7, unicode-1-1-utf-7, unicode-2-0-utf-7, utf-7, x-unicode-2-0-utf-7
|
|
|-
| UTF-8
| unicode-1-1-utf-8, utf-8, utf8
|
|
|-
| VIQR
| csviqr
|
|
|-
| VISCII
| csviscii, viscii
|
|
|-
| windows-1250
| cp1250, windows-1250, x-cp1250
|
|
|-
| windows-1251
| ansi-1251, cp1251, windows-1251, x-cp1251
|
|
|-
| windows-1252
| cp1252, windows-1252, x-cp1252
|
|
|-
| windows-1253
| cp1253, windows-1253, x-cp1253
|
|
|-
| windows-1254
| cp1254, windows-1254, x-cp1254
|
|
|-
| windows-1255
| cp1255, windows-1255, x-cp1255
|
|
|-
| windows-1256
| cp1256, windows-1256, x-cp1256
|
|
|-
| windows-1257
| cp1257, windows-1257, x-cp1257
|
|
|-
| windows-1258
| cp1258, windows-1258, x-cp1258
|
|
|-
| windows-874
| ibm874, windows-874
|
|
|-
| windows-936
| windows-936
|
|
|-
| x-euc-tw
| cns11643, x-euc-tw, zh_tw-euc
|
|
|-
| x-gbk
| gbk, x-gbk
|
|
|-
| x-imap4-modified-utf7
| x-imap4-modified-utf7
|
|
|-
| x-johab
| x-johab
|
|
|-
| x-mac-arabic
| x-mac-arabic
|
|
|-
| x-mac-ce
| x-mac-ce
|
|
|-
| x-mac-croatian
| x-mac-croatian
|
|
|-
| x-mac-cyrillic
| x-mac-cyrillic
|
|
|-
| x-mac-devanagari
| x-mac-devanagari
|
|
|-
| x-mac-farsi
| x-mac-farsi
|
|
|-
| x-mac-greek
| x-mac-greek
|
|
|-
| x-mac-gujarati
| x-mac-gujarati
|
|
|-
| x-mac-gurmukhi
| x-mac-gurmukhi
|
|
|-
| x-mac-hebrew
| x-mac-hebrew
|
|
|-
| x-mac-icelandic
| x-mac-icelandic
|
|
|-
| x-mac-roman
| csMacintosh, mac, macintosh, x-mac-roman
|
|
|-
| x-mac-romanian
| x-mac-romanian
|
|
|-
| x-mac-turkish
| x-mac-turkish
|
|
|-
| x-mac-ukrainian
| x-mac-ukrainian
|
|
|-
| x-obsoleted-EUC-JP
| x-obsoleted-euc-jp
|
|
|-
| x-obsoleted-ISO-2022-JP
| x-obsoleted-iso-2022-jp
|
|
|-
| x-obsoleted-Shift_JIS
| x-obsoleted-shift_jis
|
|
|-
| x-user-defined
| x-user-defined
|
|
|-
| x-viet-tcvn5712
| x-viet-tcvn5712
|
|
|-
| x-viet-vni
| x-viet-vni
|
|
|-
| x-viet-vps
| x-viet-vps
|
|
|-
| x-windows-949
| ks_c_5601-1987, x-windows-949
|
|
|}
 
Table generated from <http://mxr.mozilla.org/firefox/source/intl/uconv/src/charsetalias.properties>.


=== Chrome ===
=== Chrome ===

Revision as of 21:44, 21 August 2009

My scratchpad for encoding related notes.

Goals

  • Document existing practices for
    • Supported encodings
    • Supported aliases
    • Supported matching algorithm
  • Converge the various used algorithms
  • Get the new rules implemented

Current Implementations

Does this differ per platform? Opera might differ a bit on Mac.

Data

Integrate this awesome data somehow:

Opera

Matching

UTS22 and ASCII lowercasing.

Encodings

Encoding Aliases Decoded As Notes
big5 big5, cnbig5, csbig5
big5-hkscs big5hkscs
euc-jp cseucpkdfmtjapanese, eucjp, extendedunixcodepackedformatforjapanese
euc-kr cseuckr, csksc56011987, euckr, isoir149, korean, ksc5601, ksc56011987, ksc56011989, windows949
euc-tw euctw
gb18030 gb18030
gbk chinese, cngb, cp936, csgb2312, csiso58gb231280, euccn, gb2312, gb231280, gbk, isoir58, ms936, windows936
hz-gb-2312 hzgb2312
ibm866 866, cp866, csibm866, ibm866
iso-2022-cn iso2022cn
iso-2022-jp csiso2022jp, iso2022jp
iso-2022-jp-1 iso2022jp1
iso-2022-kr csiso2022kr, iso2022kr
iso-8859-1 cp819, csisolatin1, ibm819, iso88591, iso885911987, isoir100, l1, latin1 windows-1252
iso-8859-2 csisolatin2, iso88592, iso885921987, isoir101, l2, latin2
iso-8859-3 csisolatin3, iso88593, iso885931988, isoir109, l3, latin3
iso-8859-4 csisolatin4, iso88594, iso885941988, isoir110, l4, latin4
iso-8859-5 csisolatincyrillic, cyrillic, iso88595, iso885951988, isoir144
iso-8859-6 arabic, asmo708, csiso88596e, csisolatinarabic, ecma114, iso88596, iso885961987, iso88596e, isoir127
iso-8859-6-i csiso88596i, iso88596i
iso-8859-7 csisolatingreek, ecma118, elot928, greek, greek8, iso88597, iso885971987, isoir126
iso-8859-8 csiso88598e, csisolatinhebrew, hebrew, iso88598, iso885981988, iso88598e, isoir138, visual
iso-8859-8-i csiso88598i, iso88598i
iso-8859-9 csisolatin5, iso88599, iso885991989, isoir148, l5, latin5
iso-8859-10 csisolatin6, iso885910, iso8859101992, isoir157, l6, latin6
iso-8859-11 iso885911, tis620, tis6202533, windows874
iso-8859-13 iso885913
iso-8859-14 iso885914, iso8859141998, isoceltic, isoir199, l8, latin8
iso-8859-15 iso885915, latin9
iso-8859-16 iso885916, iso8859162001, isoir226, l10, latin10
koi8-r cskoi8r, koi8r
koi8-u koi8u
macintosh csmacintosh, mac, macintosh, macroman Likely disabled.
shift_jis cp932, csshiftjis, cswindows31j, ms932, mskanji, shiftjis, sjis, windows31j
tcvn tcvn, viettcvn
us-ascii ansix341968, ansix341986, ascii, cp367, csascii, csinvariant, csiso646basic1983, ibm367, invariant, iso646basic1983, iso646irv1991, iso646us, isoir6, ref, us, usascii windows-1252
utf-16 csunicode, csunicode11, csunicodeascii, iso10646j1, iso10646ucs2, iso10646ucsbasic, utf16
utf-16be utf16be
utf-16le utf16le
utf-8 utf8
viscii csviscii, viscii
windows-1250 cp1250, microsoftcp1250, windows1250
windows-1251 cp1251, microsoftcp1251, windows1251
windows-1252 cp1252, microsoftcp1252, windows1252
windows-1253 cp1253, microsoftcp1253, windows1253
windows-1254 cp1254, microsoftcp1254, windows1254
windows-1255 cp1255, microsoftcp1255, windows1255
windows-1256 cp1256, microsoftcp1256, windows1256
windows-1257 cp1257, microsoftcp1257, windows1257
windows-1258 cp1258, microsoftcp1258, windows1258
windows-sami-2 samiws2, windowssami2, ws2
x-mac-ce macce Likely disabled.
x-mac-cyrillic maccyrillic Likely disabled.
x-mac-greek macgreek Likely disabled.
x-mac-turkish macturkish Likely disabled.
x-vps vps

Firefox

Encoding Aliases Decoded As Notes
armscii-8 armscii-8
Big5 big5, csbig5, x-x-big5, zh_tw-big5
Big5-HKSCS big5-hkscs
EUC-JP cseucjpkdfmtjapanese, euc-jp, x-euc-jp
EUC-KR 5601, csksc56011987, csueckr, euc-kr, iso-ir-149, korean, ks_c_5601-1989, ksc5601, ksc_5601
gb18030 gb18030
GB2312 chinese, csgb2312, csiso58gb231280, gb2312, gb_2312, gb_2312-80, iso-ir-58, zh_cn.euc
GEOSTD8 geostd8
HZ-GB-2312 hz-gb-2312
IBM850 850, cp850, csIBM850, ibm850
IBM852 852, cp852, csIBM852, ibm852
IBM855 855, cp855, csIBM855, ibm855
IBM857 857, cp857, csIBM857, ibm857
IBM862 862, cp862, csIBM862, ibm862
IBM864 864, cp864, csIBM864, ibm-864, ibm864
IBM864i 864i, cp864i, csibm864i, ibm-864i, ibm864i
IBM866 866, cp-866, cp866, csIBM866, ibm866
ISO-2022-CN iso-2022-cn, iso-2022-cn-ext
ISO-2022-JP csiso2022jp, csiso2022jp2, iso-2022-jp, iso-2022-jp-2
ISO-2022-KR csiso2022kr, iso-2022-kr
ISO-8859-1 cp819, csisolatin1, ibm819, iso-8859-1, iso-ir-100, iso8859-1, iso88591, iso_8859-1, l1, latin1
ISO-8859-10 csisolatin6, iso-8859-10, iso-ir-157, iso8859-10, iso885910, l6, latin6
ISO-8859-11 iso-8859-11, iso8859-11, iso885911
ISO-8859-12 iso885912
ISO-8859-13 iso-8859-13, iso8859-13, iso885913
ISO-8859-14 iso-8859-14, iso8859-14, iso885914
ISO-8859-15 iso-8859-15, iso8859-15, iso885915, iso_8859-15
ISO-8859-16 iso-8859-16
ISO-8859-2 csisolatin2, iso-8859-2, iso-ir-101, iso8859-2, iso88592, iso_8859-2, l2, latin2
ISO-8859-3 csisolatin3, iso-8859-3, iso-ir-109, iso8859-3, iso88593, iso_8859-3, l3, latin3
ISO-8859-4 csisolatin4, iso-8859-4, iso-ir-110, iso8859-4, iso88594, iso_8859-4, l4, latin4
ISO-8859-5 csisolatincyrillic, cyrillic, iso-8859-5, iso-ir-144, iso8859-5, iso88595, iso_8859-5
ISO-8859-6 arabic, asmo-708, csisolatinarabic, ecma-114, iso-8859-6, iso-ir-127, iso8859-6, iso88596, iso_8859-6
ISO-8859-6-E csiso88596e, iso-8859-6-e
ISO-8859-6-I csiso88596i, iso-8859-6-i
ISO-8859-7 csisolatingreek, ecma-118, elot_928, greek, greek8, iso-8859-7, iso-ir-126, iso8859-7, iso88597, iso_8859-7, sun_eu_greek
ISO-8859-8 csisolatinhebrew, hebrew, iso-8859-8, iso-ir-138, iso8859-8, iso88598, iso_8859-8, visual
ISO-8859-8-E csiso88598e, iso-8859-8-e
ISO-8859-8-I csiso88598i, iso-8859-8-i, iso-8859-8i
ISO-8859-9 csisolatin5, iso-8859-9, iso-ir-148, iso8859-9, iso88599, iso_8859-9, l5, latin5
ISO-IR-111 csiso111ecmacyrillic, ecma-cyrillic, iso-ir-111
KOI8-R koi8-r
KOI8-U koi8-u
Shift_JIS csshiftjis, ms_kanji, shift-jis, shift_jis, windows-31j, x-sjis
T.61-8bit csiso103t618bit, iso-ir-103, t.61, t.61-8bit
TIS-620 tis-620, tis620
us-ascii 646, ansi_x3.4-1968, ascii, us-ascii
UTF-16 utf-16
UTF-16BE csunicode, csunicode11, csunicodeascii, csunicodelatin1, iso-10646, iso-10646-j-1, iso-10646-ucs-2, iso-10646-ucs-basic, iso-10646-unicode-latin1, utf-16be, x-iso-10646-ucs-2-be
UTF-16LE utf-16le, x-iso-10646-ucs-2-le
UTF-32BE iso-10646-ucs-4, utf-32be, x-iso-10646-ucs-4-be
UTF-32LE utf-32le, x-iso-10646-ucs-4-le
UTF-7 csunicode11utf7, unicode-1-1-utf-7, unicode-2-0-utf-7, utf-7, x-unicode-2-0-utf-7
UTF-8 unicode-1-1-utf-8, utf-8, utf8
VIQR csviqr
VISCII csviscii, viscii
windows-1250 cp1250, windows-1250, x-cp1250
windows-1251 ansi-1251, cp1251, windows-1251, x-cp1251
windows-1252 cp1252, windows-1252, x-cp1252
windows-1253 cp1253, windows-1253, x-cp1253
windows-1254 cp1254, windows-1254, x-cp1254
windows-1255 cp1255, windows-1255, x-cp1255
windows-1256 cp1256, windows-1256, x-cp1256
windows-1257 cp1257, windows-1257, x-cp1257
windows-1258 cp1258, windows-1258, x-cp1258
windows-874 ibm874, windows-874
windows-936 windows-936
x-euc-tw cns11643, x-euc-tw, zh_tw-euc
x-gbk gbk, x-gbk
x-imap4-modified-utf7 x-imap4-modified-utf7
x-johab x-johab
x-mac-arabic x-mac-arabic
x-mac-ce x-mac-ce
x-mac-croatian x-mac-croatian
x-mac-cyrillic x-mac-cyrillic
x-mac-devanagari x-mac-devanagari
x-mac-farsi x-mac-farsi
x-mac-greek x-mac-greek
x-mac-gujarati x-mac-gujarati
x-mac-gurmukhi x-mac-gurmukhi
x-mac-hebrew x-mac-hebrew
x-mac-icelandic x-mac-icelandic
x-mac-roman csMacintosh, mac, macintosh, x-mac-roman
x-mac-romanian x-mac-romanian
x-mac-turkish x-mac-turkish
x-mac-ukrainian x-mac-ukrainian
x-obsoleted-EUC-JP x-obsoleted-euc-jp
x-obsoleted-ISO-2022-JP x-obsoleted-iso-2022-jp
x-obsoleted-Shift_JIS x-obsoleted-shift_jis
x-user-defined x-user-defined
x-viet-tcvn5712 x-viet-tcvn5712
x-viet-vni x-viet-vni
x-viet-vps x-viet-vps
x-windows-949 ks_c_5601-1987, x-windows-949

Table generated from <http://mxr.mozilla.org/firefox/source/intl/uconv/src/charsetalias.properties>.

Chrome

FIXME

Internet Explorer

Needs sorting out:

Safari

FIXME