[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: My draft for windows 1252



Apologies if this was already widely discussed in the previous thread -

On names like "cp1252": These are ambiguous, and I recommend against
adding them as aliases unless they are commonly used for a particular
charset. They are ambiguous because IBM started using 16-bit integers
for "code pages" a long time ago, and Microsoft adopted this practice
and a set of such integers for DOS and Windows. As such, you will find
wide usage of "cp1252" to mean either IBM's or Microsoft's idea of
that code page, which will usually differ. One could argue that, on a
per-integer basis, the company is "right" which "invented" that code
page (e.g., Microsoft for 1252, IBM for 850), but I think that may
just increase confusion.

On 11/8/06, Frank Ellermann <nobody@xyzzy.claranet.de> wrote:
> For cp1252, are you sure that it's actually used anywhere ?

I believe that Java uses "cp1252" and similar names. I am not sure
whether they use the IBM or the Microsoft interpretations of such code
pages, and I believe that even differs in at least some cases between
Sun's JDK and IBM's JDK. I also don't know if Java applications
commonly use such aliases in protocols.

> Not counting file names of ICU mappings or similar cases, is there a real
> application talking with other applications about "cp1252" ?

As for ICU, once we discovered the ambiguous use of "cp" prefixes, we
became more consistent, for internal identification of mapping tables,
with using "ibm-" prefixes for IBM CCSID integers and "windows-"
prefixes for Microsoft code page integers. (ICU uses strings while IBM
and Microsoft use integers. This was our way of bridging that gap.)
With ICU, "cp" names are used only where someone else (like Java)
recognizes them.

> Sorry for the stupid question, but on my box I'd use a command like
> "chcp 1004" or the system function below it, without any "cp" prefix.
>
> Based on that we might need an alias "1252", not "cp1252".

I disagree. On a DOS or Windows system, the 1004 in "chcp 1004" gets
parsed into an integer, which is a different beast from what the IANA
charsets list deals with. I don't think that decimal-digit-string
representations of such integers should be added as aliases unless
they are otherwise in common use as strings. I have not seen that as
widespread practice.

> Or we just
> ignore "1252" / "cp1252" because it's obvious.  I certainly don't need
> a "1004" or "cp1004" alias.

I don't think "obvious" is an argument one way or another. In my
opinion, the listed aliases should reflect common industry practice.
If anything, _remove_ any aliases that are controversial or ambiguous
or otherwise undesirable. Please don't _add_ aliases that are not
already in common use.

Best regards,
markus
-- 
Opinions expressed here may not reflect my company's positions unless
otherwise noted.