[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposal for additional Aliases to IANA registry of character sets
begin quotation by Markus Scherer on 2002/8/6 8:07 -0700:
> Chris Newman wrote:
>> However, I do object to the following two aliases:
>>> ISO_8859-1:1987 IBM819 IBM-819
>>> ANSI_X3.4-1968 IBM367 IBM-367
>
> Well, fact is that
> a) There are already a number of aliases for these charsets
> that implementations have to deal with.
Yes, and that's extremely unfortunate and I would have objected had I been
engaged in standards when the registry was created, but alas I've only been
involved in the IETF for 11 years.
> b) There is a large installed base of software for (and of) IBM
> operating systems and middleware that use the numeric IDs
> 819 and 367 and tend to prepend "IBM-" to all IDs
> for interoperation with open-standards systems.
And if that software emits the "IBM-#" aliases on the open Internet, it is
non-compliant and needs to be fixed. What makes more sense: forcing all
standards-compliant software to change to use a new alias, or forcing just
the limited set of broken IBM software to use the correct standard names
for interoperable charsets?
> Although these charsets are among the most important and most widely
> used, it seems artificial to limit the use of aliases only for these two.
Aliases impede interoperability by creating a cross-product of cases to
test interoperability (number of aliases * number of products). I can't
make a strong objection to the addition of aliases to limited-use charsets
because they're already limited use -- meaning interoperability is neither
important nor expected. But for those charsets which are already widely
used, the addition of aliases breaks the existing interoperable installed
base of standards-compliant software. That is the point where any good
engineer should stand up and object. And it's not just _two_ character
sets. Here's a partial list of interoperable character sets:
MIME
MIME charset Text MIBenum Alternate References
-------------- ---- ------- --------- ----------
ISO-8859-1 Yes 4 UTF-8 [RFC2046,ISO-8859]
ISO-8859-2 Yes 5 UTF-8 [RFC2046,ISO-8859]
ISO-8859-3 Yes 6 UTF-8 [RFC2046,ISO-8859]
ISO-8859-4 Yes 7 UTF-8 [RFC2046,ISO-8859]
ISO-8859-5 Yes 8 KOI8-R,UTF-8 [RFC2046,ISO-8859]
ISO-8859-6 Yes 9 UTF-8 [RFC2046,ISO-8859]
ISO-8859-7 Yes 10 UTF-8 [RFC1947,2046,ISO-8859]
ISO-8859-8 Yes 11 UTF-8 [RFC1555,2046,ISO-8859]
ISO-8859-9 Yes 12 UTF-8 [RFC2046,ISO-8859]
ISO-8859-10 Yes 13 UTF-8 [RFC2046,ISO-8859]
US-ASCII Yes 3 N/A [RFC2046]
UTF-8 Yes 106 N/A [RFC2279]
ISO-8859-6-E Yes 81 UTF-8 [RFC1556]
ISO-8859-6-I Yes 82 UTF-8 [RFC1556]
ISO-8859-8-E Yes 84 UTF-8 [RFC1556]
ISO-8859-8-I Yes 85 UTF-8 [RFC1556]
KOI8-R Yes 2084 ISO-8859-5,UTF-8 [RFC1489]
KOI8-U Yes 2088 UTF-8 [RFC2319]
ISO-2022-KR Yes 37 EUC-KR,UTF-8 [RFC1557,KS_C_5601-1987]
EUC-KR Yes 38 ISO-2022-KR,UTF-8 [RFC1557,KS_C_5601-1987]
ISO-2022-JP Yes 39 UTF-8 [RFC1468]
ISO-2022-CN Yes 104*A UTF-8 [RFC1922]
CN-GB Yes N/A UTF-8 [RFC1922]
CN-Big5 Yes N/A UTF-8 [RFC1922]
HZ-GB-2312 Yes 2085 UTF-8 [RFC1842,1843]
VISCII Yes 2082 UTF-8 [RFC1456]
VIQR Yes 2083 VISCII,UTF-8 [RFC1456]
GB2312 Yes? 2025 UTF-8 [RFC1922]
Big5 Yes? 2026 UTF-8 [RFC1922]
EUC-JP Yes 18 ISO-2022-JP,UTF-8 [JIS X0212-1990]
Shift_JIS Yes 17 ISO-2022-JP,UTF-8 [JIS X0212-1990]
I will object to the addition of aliases to any of these.
- Chris