This is regarding the recent
threads about the windows-1252 code page. Our purpose in providing the best fit tables to Unicode was
to resolve any uncertainty about what our best-fit behavior was. These
code page tables aren’t intended to replace the existing windows-1252, etc.
tables. We certainly do expect or want these to be registered as a
separate code page. The best fit tables are merely a superset of the
existing tables on the Unicode site. For the ietf’s purposes those
existing tables are preferred. Regarding the form of the tables. The original windows
table on the Unicode site were apparently massaged into a normal form, which
also removed the ability to preserve the best fit behavior. Additionally
the most convenient and error free method of creating the files was just to
copy them from the Windows Vista source tree, so these are basically our source
tables. The line endings probably got cleaned up in the copying, but
basically its just a raw copy. As pointed out, some of the character name, etc. comments
aren’t accurate or use older versions. Additionally the tables appear to
have been originally created with the comments in the code page they describe,
so some of the double byte code pages that include character examples are
pretty look pretty strange when opened with a different code page. Personally
I’d ignore the comments and look at just the mappings. I’d also like to point out that the best-fit behavior itself
is pretty inconsistent, random, and sometimes funny. Mapping Infinity to
8 is particularly odd. We haven’t updated the best-fit tables, and don’t
intend to, so many logical mappings of new characters aren’t included.
These tables are also pretty old, so “new characters” in this context could be
pretty old as well. Additionally the mappings are error-prone and could
have missed obvious look-alikes or made unexpected mappings based on an
individual whim. Of course, as always, we prefer that applications use
Unicode to persist data, and we consider the best fit behavior to be an old
idea that hopefully people won’t use any more. For those that do need
this information we hope that these tables might assist them. I’ve blogged about best fit at http://blogs.msdn.com/shawnste/archive/2006/01/19/515047.aspx
FWIW: Microsoft also has no intention of updating the
windows code pages, changing them breaks people as we discovered adding the
Euro, and we don’t want to do that again. For new locales and users not
supported by the existing code pages we recommend using Unicode. - Shawn Shawn Steele shawnste@microsoft.com Windows International Microsoft |