[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Best fit (was: Update of charset windows-1252)



Regarding the 'best fit' mappings, I think it would be good
to add a pointer to these to the "Additional Information"
section, but they should not be part of the definition of
the charset proper.

First a charset is defined as a way to get from bytes to characters,
for which "best fit" mappings are only marginally relevant.
No mapping of an illegal byte (sequence) should be called "best fit").

Second, Microsoft provides a way to switch these off. A MIME
application (such as an MUA or a Web page editing application)
really SHOULD NOT use the "best fit"; in the Web case, it
should use NCRs (numeric character references), in the mail
case, it should automatically chose another encoding or
ask the user to chose another one.

In summary, charsets are first and formost a means to correctly
exchange correct data, and only secondarily a means to label
processing behavior including error treatment and fallbacks.

Regards,    Martin.

At 00:30 06/10/22, Erik van der Poel wrote:
>I don't know who created the tables, but they were submitted by an
>individual from Microsoft.
>
>The windows-1252 iana charset update does offer a contact (Mike Ksar).
>
>The "CPINFO 1 0x3F 0x003F" simply indicates how Microsoft's
>implementation maps characters that are not in the destination charset
>(0x3F) or illegally encoded (0x003F), depending on the direction (from
>Unicode or to Unicode). See the readme. ICU may have chosen 0x1A, but
>that was their own decision. There is no interoperability problem here
>because the legal characters are fully specified.
>
>The 698 WCTABLE mappings are from Microsoft's implementation. Many of
>them are "best fit" mappings. I have confirmed that their
>implementation does return these. They do have an option to turn off
>the "best fit" mappings.
>
>The mappings are sorted in a strange way. Maybe they will fix that,
>but it shouldn't prevent this charset from being updated at IANA.
>
>Regarding Kent's comment saying that best fit mappings should not be
>part of an IANA registration: First, Martin says there's not enough
>info, now you say there's too much! :-)
>
>Should we strip the best fit mappings from the table and post it somewhere?
>
>Erik
>
>On 10/21/06, Frank Ellermann <nobody@xyzzy.claranet.de> wrote:
>> Erik van der Poel wrote:
>>
>> > http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
>>
>> Who created these new "best fit" tables ?  The old table...
>>
>> http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
>>
>> ...offers a Contact: cpxlate AT microsoft.com
>>
>> The reason I ask is the line "CPINFO 1 0x3F 0x003F" in this table.  It's
>> how I implemented it for "codepage 1004" (an OS/2 alias for windows-1252).
>>
>> <http://purl.net/net/cp/1252> (ICU) proposes 0x1A for windows-1252.
>>
>> What's the source of the 698 WCTABLE mappings ?   The sorting of 0x20ac
>> could be some artefact of 0x20a0 (I've no idea what u+20A0 really is):
>>
>> [...]
>> 0x2089  0x39    ;Subscript Nine
>> 0x20ac  0x80    ;Euro Sign
>> 0x20a1  0xa2    ;Colon Sign
>> 0x20a4  0xa3    ;Lira Sign
>> 0x20a7  0x50    ;Peseta Sign
>> 0x2102  0x43    ;Double-Struck Capital C
>> [...]
>>
>> Frank
>>
>>
>>


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp