[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registering a charset alias



On Thu, Aug 13, 2009 at 12:59 PM, Anne van Kesteren<annevk@opera.com> wrote:
> On Thu, 13 Aug 2009 20:54:04 +0200, Erik van der Poel <erikv@google.com> wrote:
>> I agree that it would be great to record the "de facto" standard
>> charset names somewhere. Preferably at IANA, where the "de jure"
>> charset names are registered.
>>
>> Normally, IANA does not register charset names that start with "x-",
>> but since they are accepted by major implementations, it would be nice
>> if they were recorded somewhere. The "x-" names should probably only
>> be recommended for input, however. For output, we should strongly
>> recommend the names without "x-", preferably a single name for each
>> character encoding.
>
> Yeah, I believe HTML5 currently encourages authors to use the preferred IANA name of the encoding.

The IANA preferred names are all fine, but the following names should
probably also be preferred (since they occur more frequently on the
Web than their aliases):

GBK
ISO-8859-15
macintosh

>> It would also be great if we could record the actual superset
>> relationships that major implementations use. For example, when a
>> document is labelled with the charset gb2312, major implementations
>> use the "superset" gbk instead.
>
> Just like US-ASCII or ISO-8859-1 is treated as windows-1252?

Yes.

> HTML5 includes these mappings too, currently in section 2.7

Here is the current table, from HTML 5:

EUC-KR	 windows-949	[EUCKR] [WIN949]
GB2312	 GBK	[RFC1345] [GBK]
GB_2312-80	 GBK	[RFC1345] [GBK]
ISO-8859-1	 windows-1252	[RFC1345] [WIN1252]
ISO-8859-9	 windows-1254	[RFC1345] [WIN1254]
ISO-8859-11	 windows-874	[ISO885911] [WIN874]
KS_C_5601-1987	 windows-949	[RFC1345] [WIN949]
Shift_JIS	 windows-31J	[SHIFTJIS] [WIN31J]
TIS-620	 windows-874	[TIS620] [WIN874]
US-ASCII	 windows-1252	[RFC1345] [WIN1252]

Google Web Search agrees with all of these, except for gb2312 and gbk,
which are treated as gb18030.

> but we
> were not sure whether IANA would formally register them. If that can
> be done I think that would be awesome.
>
> If it can only be done for a few of the encodings, but not all, because
> e.g. we think that keeping ISO-8859-1 and windows-1252 distinct for
> some usage is desirable, that would be OK too. The more is dealt
> with at the registry level the better in my opinion.

If IANA were to register these subset/superset relationships, I think
they should be registered as supersets rather than aliases.

Erik