[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Best fit (was: Update of charset windows-1252)




Erik van der Poel wrote:

> The "CPINFO 1 0x3F 0x003F" simply indicates how Microsoft's
> implementation maps characters that are not in the destination charset
> (0x3F) or illegally encoded (0x003F), depending on the direction (from
> Unicode or to Unicode). See the readme. ICU may have chosen 0x1A, but

Not quite. The ICU API allows for quite a lot of "user" (i.e. user of
the API)
control over how to do error substitutions.

[...]
> Regarding Kent's comment saying that best fit mappings should not be
> part of an IANA registration: First, Martin says there's not enough
> info, now you say there's too much! :-)
> 
> Should we strip the best fit mappings from the table and post 
> it somewhere?

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT

will do just fine, and is actually more up to date than the "bestfit"
file
w.r.t. character names... ("ligature ae" vs. "letter ae", and this was
changed
a decade ago; THE infamous change that led to the decision to never
change character names again). But both are in error w.t.r. some control
character names ("horizontal tabulation" -> "character tabulation" and
"vertical tabulation" -> "line tabulation" and the "*separator" ones).

		/kent k