[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for additional Aliases to IANA registry of character sets



begin quotation by Markus Scherer on 2002/8/6 8:07 -0700:
> Chris Newman wrote:
>> However, I do object to the following two aliases:
>>> ISO_8859-1:1987   IBM819                  IBM-819
>>> ANSI_X3.4-1968    IBM367                  IBM-367
>
> Well, fact is that
> a) There are already a number of aliases for these charsets
>     that implementations have to deal with.

Yes, and that's extremely unfortunate and I would have objected had I been 
engaged in standards when the registry was created, but alas I've only been 
involved in the IETF for 11 years.

> b) There is a large installed base of software for (and of) IBM
>     operating systems and middleware that use the numeric IDs
>     819 and 367 and tend to prepend "IBM-" to all IDs
>     for interoperation with open-standards systems.

And if that software emits the "IBM-#" aliases on the open Internet, it is 
non-compliant and needs to be fixed.  What makes more sense: forcing all 
standards-compliant software to change to use a new alias, or forcing just 
the limited set of broken IBM software to use the correct standard names 
for interoperable charsets?

> Although these charsets are among the most important and most widely
> used, it seems artificial to limit the use of aliases only for these two.

Aliases impede interoperability by creating a cross-product of cases to 
test interoperability (number of aliases * number of products).  I can't 
make a strong objection to the addition of aliases to limited-use charsets 
because they're already limited use -- meaning interoperability is neither 
important nor expected.  But for those charsets which are already widely 
used, the addition of aliases breaks the existing interoperable installed 
base of standards-compliant software.  That is the point where any good 
engineer should stand up and object.  And it's not just _two_ character 
sets.  Here's a partial list of interoperable character sets:

                  MIME
MIME charset      Text MIBenum  Alternate         References
--------------    ---- -------  ---------         ----------
ISO-8859-1        Yes  4        UTF-8             [RFC2046,ISO-8859]
ISO-8859-2        Yes  5        UTF-8             [RFC2046,ISO-8859]
ISO-8859-3        Yes  6        UTF-8             [RFC2046,ISO-8859]
ISO-8859-4        Yes  7        UTF-8             [RFC2046,ISO-8859]
ISO-8859-5        Yes  8        KOI8-R,UTF-8      [RFC2046,ISO-8859]
ISO-8859-6        Yes  9        UTF-8             [RFC2046,ISO-8859]
ISO-8859-7        Yes  10       UTF-8             [RFC1947,2046,ISO-8859]
ISO-8859-8        Yes  11       UTF-8             [RFC1555,2046,ISO-8859]
ISO-8859-9        Yes  12       UTF-8             [RFC2046,ISO-8859]
ISO-8859-10       Yes  13       UTF-8             [RFC2046,ISO-8859]
US-ASCII          Yes  3        N/A               [RFC2046]
UTF-8             Yes  106      N/A               [RFC2279]
ISO-8859-6-E      Yes  81       UTF-8             [RFC1556]
ISO-8859-6-I      Yes  82       UTF-8             [RFC1556]
ISO-8859-8-E      Yes  84       UTF-8             [RFC1556]
ISO-8859-8-I      Yes  85       UTF-8             [RFC1556]
KOI8-R            Yes  2084     ISO-8859-5,UTF-8  [RFC1489]
KOI8-U            Yes  2088     UTF-8             [RFC2319]
ISO-2022-KR       Yes  37       EUC-KR,UTF-8      [RFC1557,KS_C_5601-1987]
EUC-KR            Yes  38       ISO-2022-KR,UTF-8 [RFC1557,KS_C_5601-1987]
ISO-2022-JP       Yes  39       UTF-8             [RFC1468]
ISO-2022-CN       Yes  104*A    UTF-8             [RFC1922]
CN-GB             Yes  N/A      UTF-8             [RFC1922]
CN-Big5           Yes  N/A      UTF-8             [RFC1922]
HZ-GB-2312        Yes  2085     UTF-8             [RFC1842,1843]
VISCII            Yes  2082     UTF-8             [RFC1456]
VIQR              Yes  2083     VISCII,UTF-8      [RFC1456]
GB2312            Yes? 2025     UTF-8             [RFC1922]
Big5              Yes? 2026     UTF-8             [RFC1922]
EUC-JP            Yes  18       ISO-2022-JP,UTF-8 [JIS X0212-1990]
Shift_JIS         Yes  17       ISO-2022-JP,UTF-8  [JIS X0212-1990]

I will object to the addition of aliases to any of these.

                - Chris