[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Registration of some code pages



> In my understanding, Windows-31J refers Microsoft Windows Codepage 932.
> Sp it should be helpful to describe microsoft behavior.

One would think so :)  I have no clue what the history is, but it seems like it could be cleaned up a bit.

> Proposed revision of "Shift_JIS" and "Windows-31J"
> http://www2.xml.gr.jp/log.html?MLID=3Dxmlmoji&N=3D142

Unfortunately that link doesn't work for me :(

>> Unfortunately the names we use are already assigned :( =A0Obviously exist=
ing behavior isn't going to change.

> What are "the names"?

"shift_jis".  If you ask windows/mlang/.Net for shift_jis, you get 31J behavior.  Ditto for iso-2022-jp/50220.  Also, if you ask mlang/.Net what the "name" is for those, it returns "shift_jis" and iso-2022-jp, not some windows-specific name.

> If I have a portable software, it should work on Unix as the same as
> it does on Windows.
> So the expectation that "shift_jis" on Windows means "Windows-31J" seems wr=
> ong.

That's the fundemental problem.  If you have portable software and run it on Unix and on Windows, and save your file using "shift_jis" you're going to have some odd discrepencies.  Obviously that's not good, but it's pretty entrenched.  Clearly we cannot expect Unix boxes to pretend shift_jis is Windows-31J (but some apps do), however it's also a tad unreasonable to expect Windows boxes to suddenly be very strict when they encounter "shift_jis" as that would break a very large number of documents that currently "work."

My feeling that this is a fairly annoying pain, and I could probably invent a number of transition schemes that might get some sort of reasonable parity and migrate documents over a decade or two.  However, I think that would still be a painful process, and that everyone's energy would be better spent encouraging use of a more consistent encoding, such as UTF-8, that avoids most of the problems with code pages evolving in different directions.

> Such automatic overrides are needed for documents which declares its
> encoding in the document or related metadata.

My suggestion is not to make such a replacement "automatic", but rather noting "somewhere" (like the standards or registry) that this name misrepresentation happens sometime.  Then the app developer can figure out what to do for their app and user base.

-Shawn