[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: My draft for windows 1252



An example of the problem is where component A thinks that cp936 means one thing, and system B thinks it means another. That can lead to customer data corruption. A takes some data in cp936 out of a database to Unicode, then sends it (via perhaps very circuitious routes) to B, which converts back to cp946 and stuffs it into another database. However, since one of the characters doesn't convert (since the conversion tables disagree), a character gets trashed, and the data is corrupted.

The way the IANA registry has grown up, this indeterminacy in the meaning of iana charset names and aliases causes no end of problems. In an ideal world, each iana charset name would be associated with one and only one mapping to Unicode/10646. If there were two different mappings, no matter how subtle the difference, they would perforce have two different iana names. (Of course, in an 'idealer' world, we'd have already transitioned to Unicode for interchange, and this wouldn't matter ;-)

Mark

On 11/15/06, Shawn Steele <Shawn.Steele@microsoft.com> wrote:
AFAIK the overlap between "Microsoft" and "IBM" numbers are more like
variations of the same language than completely different code pages
( http://www.unicode.org/Public/MAPPINGS/VENDORS/IBM/readme.txt has a
list of differences).

I've seen similar variations of other code pages that respond to the
same alias(es) with subtly different results between vendors, so I'm not
sure that the variation in implementation invalidates the alias.

In this case it seems like cp1252 is sometimes used to describe
windows-1252 (and maybe also the IBM version), so that fits my
expectation of an alias, even if the exact target is ambiguous.
Similarly 1252 is often processed internally as an integer, however it
can also appear in text (although I wouldn't expect it in MIME or http
content-types.)

So is the ietf charsets assignments only listing those aliases used with
Internet protocols?  It appears that some software uses cp1252 and 1252
as aliases, but none cases have been mentioned where they are used with
an Internet protocol.

Personally I don't care much either way, but it seems safer to me to err
on the side of including aliases if we think that they might be used,
which seems the opposite of Markus's position :)

- Shawn