[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Best fit (was: Update of charset windows-1252)
I don't know who created the tables, but they were submitted by an
individual from Microsoft.
The windows-1252 iana charset update does offer a contact (Mike Ksar).
The "CPINFO 1 0x3F 0x003F" simply indicates how Microsoft's
implementation maps characters that are not in the destination charset
(0x3F) or illegally encoded (0x003F), depending on the direction (from
Unicode or to Unicode). See the readme. ICU may have chosen 0x1A, but
that was their own decision. There is no interoperability problem here
because the legal characters are fully specified.
The 698 WCTABLE mappings are from Microsoft's implementation. Many of
them are "best fit" mappings. I have confirmed that their
implementation does return these. They do have an option to turn off
the "best fit" mappings.
The mappings are sorted in a strange way. Maybe they will fix that,
but it shouldn't prevent this charset from being updated at IANA.
Regarding Kent's comment saying that best fit mappings should not be
part of an IANA registration: First, Martin says there's not enough
info, now you say there's too much! :-)
Should we strip the best fit mappings from the table and post it somewhere?
Erik
On 10/21/06, Frank Ellermann <nobody@xyzzy.claranet.de> wrote:
> Erik van der Poel wrote:
>
> > http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
>
> Who created these new "best fit" tables ? The old table...
>
> http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
>
> ...offers a Contact: cpxlate AT microsoft.com
>
> The reason I ask is the line "CPINFO 1 0x3F 0x003F" in this table. It's
> how I implemented it for "codepage 1004" (an OS/2 alias for windows-1252).
>
> <http://purl.net/net/cp/1252> (ICU) proposes 0x1A for windows-1252.
>
> What's the source of the 698 WCTABLE mappings ? The sorting of 0x20ac
> could be some artefact of 0x20a0 (I've no idea what u+20A0 really is):
>
> [...]
> 0x2089 0x39 ;Subscript Nine
> 0x20ac 0x80 ;Euro Sign
> 0x20a1 0xa2 ;Colon Sign
> 0x20a4 0xa3 ;Lira Sign
> 0x20a7 0x50 ;Peseta Sign
> 0x2102 0x43 ;Double-Struck Capital C
> [...]
>
> Frank
>
>
>