Windows-1252 Best Fit tables.

This is regarding the recent threads about the windows-1252 code page.

Our purpose in providing the best fit tables to Unicode was to resolve any uncertainty about what our best-fit behavior was. These code page tables aren’t intended to replace the existing windows-1252, etc. tables. We certainly do expect or want these to be registered as a separate code page. The best fit tables are merely a superset of the existing tables on the Unicode site. For the ietf’s purposes those existing tables are preferred.

Regarding the form of the tables. The original windows table on the Unicode site were apparently massaged into a normal form, which also removed the ability to preserve the best fit behavior. Additionally the most convenient and error free method of creating the files was just to copy them from the Windows Vista source tree, so these are basically our source tables. The line endings probably got cleaned up in the copying, but basically its just a raw copy.

As pointed out, some of the character name, etc. comments aren’t accurate or use older versions. Additionally the tables appear to have been originally created with the comments in the code page they describe, so some of the double byte code pages that include character examples are pretty look pretty strange when opened with a different code page. Personally I’d ignore the comments and look at just the mappings.

I’d also like to point out that the best-fit behavior itself is pretty inconsistent, random, and sometimes funny. Mapping Infinity to 8 is particularly odd. We haven’t updated the best-fit tables, and don’t intend to, so many logical mappings of new characters aren’t included. These tables are also pretty old, so “new characters” in this context could be pretty old as well. Additionally the mappings are error-prone and could have missed obvious look-alikes or made unexpected mappings based on an individual whim.

Of course, as always, we prefer that applications use Unicode to persist data, and we consider the best fit behavior to be an old idea that hopefully people won’t use any more. For those that do need this information we hope that these tables might assist them.

I’ve blogged about best fit at http://blogs.msdn.com/shawnste/archive/2006/01/19/515047.aspx

FWIW: Microsoft also has no intention of updating the windows code pages, changing them breaks people as we discovered adding the Euro, and we don’t want to do that again. For new locales and users not supported by the existing code pages we recommend using Unicode.

- Shawn

Shawn Steele

shawnste@microsoft.com

Windows International

Microsoft