[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: shift_jis / windows-31J



The “bigger” problem isn’t finding a name that we recognize, although that’s big too, but rather if I do:

 

using System;

using System.Text;

 

class Example

{

                static void Main()

                {

                                Console.WriteLine(Encoding.GetEncoding(932).WebName);

                                Encoding.GetEncoding("csWindows-31J");

                                Encoding.GetEncoding("Windows-31J");

                }

}

 

Then I’ll get “shift_jis” as the encoding name.  (WebName’s effectively as close as .Net gets to the IANA charset names.)  That cannot change without breaking tons of stuff.  C# does happen to recognize csWindows-31J, but the next line will throw an exception.  I’d have to dig more to see if MLang recognized the csWindows-31J, but that wouldn’t really solve the problem.

 

So, IANA could decide that Microsoft’s variant should have some name (say xxxx or maybe Windows-31J, or use the csWindows-31J we almost know about (not all products do)).  However, we’d still return “shift_jis” when you asked for the name.  We pretty much can’t change that because if you tag your .Net generated document with Encoding.WebName (like maybe an asp.net server), and you upgrade, then I won’t be able to read it if I haven’t upgraded.  Certainly that’d be a huge migration pain, and we’d much, much, much rather people migrate to UTF-8 or UTF-16 than spend any more time in old encodings.

 

Our partners and competitors would like to interoperate with our encodings, but the shift_jis name is a bit misleading since ours is a variant.  “Everyone” knows that (or quickly discovers it), but it would be nice if the that was a bit better documented in the registry.

 

-Shawn

 

From: Markus Scherer [mailto:markus.icu@gmail.com]
Sent: Thursday, November 11, 2010 10:11 AM
To: Shawn Steele
Cc: "Martin J. Dürst"; NARUSE, Yui; ietf-charsets@mail.apps.ietf.org
Subject: Re: shift_jis / windows-31J

 

2010/11/11 Shawn Steele <Shawn.Steele@microsoft.com>

> Moreover XML doesn't allow "+" for EncName.

I picked the syntax based on a previous thread a couple years ago, I didn't realize this was a problem.

My more general question is "how do I say 'shift_jis points to windows-31J' on some systems, despite what the charset registry has said for years?"

 

The ICU converter alias list has the following aliases tagged with "WINDOWS": Shift_JIS, MS_Kanji, csShiftJIS, csWindows31J, cp932, windows-932. I don't know which of these names Windows actually recognizes. If there is at least one name that Windows recognizes (MS_Kanji??) and that does not collide with an IANA standard-Shift-JIS alias, you could use that.

 

markus