[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: windows-1252

To: ietf-charsets@mail.apps.ietf.org
Subject: Re: windows-1252
From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Sun, 15 Jan 2006 08:22:34 +0100
List-Id: <ietf-charsets@mail.apps.ietf.org>
List-Owner: <mailto:ietf-charsets-owner@mail.apps.ietf.org>
List-Subscribe: <mailto:mailserv@mail.apps.ietf.org?subject=subscribe%20ietf-charsets>
List-Unsubscribe: <mailto:mailserv@mail.apps.ietf.org?subject=unsubscribe%20ietf-charsets>
Message-hash: DCF1995405906C360CDB98834922C48A
Organization: <URL:http://purl.net/xyzzy>
Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
References: <DIELIIKLNICEDFPABGKOGEPLDNAA.plugwash@p10link.net><43C97091.9070908@vanderpoel.org><6bb028490601141353r2f80fed4v4d2bdd0bd13c9d59@mail.gmail.com><43C9979A.80004@vanderpoel.org> <43C9CBD6.545D@xyzzy.claranet.de><43C9E61E.5010804@vanderpoel.org>
Sender: news <news@sea.gmane.org>
Spam-test: False ; 0.0 / 4.5

Erik van der Poel wrote:

> What discrepancies do you claim to exist, exactly?

>> Either 0x81, 0x8D, 0x8F, 0x90, and 0x9D are mapped one to
>> one, u+0081 etc., or they are not.

> RFC 2978 does not require a Unicode mapping. It says that
> there "SHOULD" be a 10646 mapping, but it does not use the
> word "MUST".

You need a good excuse to ignore a SHOULD, a typical example
are old implementations (= here old charset registrations).

It also says "MUST be stable", that's why we got tons of new
registered charsets doing something for the "Euro", like 858
instead of 850.

> are unassigned codepoints not allowed to exist in
> IANA-registered charsets

Not that I'm aware of, unassigned code points are fine.  In
the case of 1252 all it takes is to explain what the five
interesting octets are supposed to be:  Maybe "cp-1252" and
windows-1252 are two different charsets, the former with one
to one mappings, the latter with five unassigned code points.

But that's a rather important difference for implementations.

 From my POV windows-1252 is one of the most important charsets,
in practice more relevant than say Latin-9.  While I now know
that what my OS considers as "1004" is in fact cp-1252 (after
an embarassing episode with ICU when I didn't know this), I'm
still interested if that's "windows-1252" or "cp-1252" or both,
if they are identical.
                            Bye, Frank

Follow-Ups:
- Re: windows-1252
  - From: Erik van der Poel <erik@vanderpoel.org>

References:
- windows-874
  - From: peter green <plugwash@P10Link.net>
- Re: windows-874
  - From: Erik van der Poel <erik@vanderpoel.org>
- Re: windows-874
  - From: Markus Scherer <markus.icu@gmail.com>
- Re: windows-874
  - From: Erik van der Poel <erik@vanderpoel.org>
- Re: windows-874
  - From: Frank Ellermann <nobody@xyzzy.claranet.de>
- windows-1252
  - From: Erik van der Poel <erik@vanderpoel.org>

Prev by Date: windows-1252
Next by Date: Re: windows-1252
Prev by thread: windows-1252
Next by thread: Re: windows-1252
Index(es):
- Date
- Thread