[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: windows-1252



Erik van der Poel wrote:

> What discrepancies do you claim to exist, exactly?

>> Either 0x81, 0x8D, 0x8F, 0x90, and 0x9D are mapped one to
>> one, u+0081 etc., or they are not.

> RFC 2978 does not require a Unicode mapping. It says that
> there "SHOULD" be a 10646 mapping, but it does not use the
> word "MUST".

You need a good excuse to ignore a SHOULD, a typical example
are old implementations (= here old charset registrations).

It also says "MUST be stable", that's why we got tons of new
registered charsets doing something for the "Euro", like 858
instead of 850.

> are unassigned codepoints not allowed to exist in
> IANA-registered charsets

Not that I'm aware of, unassigned code points are fine.  In
the case of 1252 all it takes is to explain what the five
interesting octets are supposed to be:  Maybe "cp-1252" and
windows-1252 are two different charsets, the former with one
to one mappings, the latter with five unassigned code points.

But that's a rather important difference for implementations.

 From my POV windows-1252 is one of the most important charsets,
in practice more relevant than say Latin-9.  While I now know
that what my OS considers as "1004" is in fact cp-1252 (after
an embarassing episode with ICU when I didn't know this), I'm
still interested if that's "windows-1252" or "cp-1252" or both,
if they are identical.
                            Bye, Frank