[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: shift_jis / windows-31J
2 weeks included a holiday in the US, so I'm just pinging. Barring objections, I'll submit these updates on Friday.
-Shawn
-----Original Message-----
From: Shawn Steele
Sent: Thursday, November 18, 2010 1:54 PM
To: Shawn Steele; MURATA Makoto; Anne van Kesteren
Cc: NARUSE, Yui; Martin J. Durst; ietf-charsets@mail.apps.ietf.org; Chris Rae; Peter Constable; "Martin J. Dürst"
Subject: RE: shift_jis / windows-31J
There were some comments, but they didn't seem to impact the idea here very much.
I've updated the Windows-31J with a note about the 0x5c behavior.
I also added MacJapanese and Java SJIS to the variations comment for shift_jis, but they aren't registered.
Any further comments? I'd like to submit this for the 2 week review and then submit them. I am not ignoring Anne's request to "solve the problem here", but I think that's a more complex issue, and that these changes proposed below would more easily address some of the common confusion in this area. I'm not opposed to someone following up on a better/more complete fix, but I think it'll be hard to get consensus.
-Shawn
--------------------------------------------------------------------------------
Charset name: Windows-31J
Charset aliases: csWindows31J
MIBenum: 2024
Suitability for use in MIME text:
Yes, Windows-31J is suitable for use with subtypes of the "text" Content-Type. Note that Windows-31J is an 8-bit charset. Care should be taken to choose an appropriate Content-Transfer-Encoding.
Published specification(s):
http://msdn.microsoft.com/en-us/goglobal/cc305152.aspx
ISO 10646 equivalency table:
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
Additional information:
Windows Japanese. A variant of Shift_JIS to include NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119). The CCS's are JIS X0201:1997, JIS X0208:1997, and these extensions. Windows-31J text is commonly declared with the shift_jis name of the parent charset, and the Windows-31J name may not be recognized.
In practice 0x5C in Windows-31J is mapped to U+005C in Unicode, but usually displayed as a yen sign glyph.
Person & email address to contact for further information:
Shawn Steele
Email: Shawn.Steele@microsoft.com
Microsoft Corporation
One Microsoft Way,
Redmond, WA 98052
U.S.A.
Intended usage: LIMITED USE
--------------------------------------------------------------------------------
Charset name: Shift_JIS
MIBenum: 17
Charset aliases: MS_Kanji and csShiftJIS
Suitability for use in MIME text:
This charset can be used for the top-level media type "text".
Published specification(s): Appendix 1 of JIS X0208:1997.
ISO 10646 equivalency table:
The correspondence is defined in JIS X0208:1997, the Kanji mapping is described in Appendix 6. Column 1 of Table 2 of Appendix 5 lists some variation of punctuation, and the names given in Appendix 5 are preferred to those in Appendix 4, when available.
In computer readable formats several variations exist. An obsolete variation is available at:
http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/SHIFTJIS.TXT
Additional information:
This charset is an extension of csHalfWidthKatakana by adding graphic characters in JIS X 0208. The CCS's are JIS X0201:1997 and JIS X0208:1997.
Several vendor specific charsets that derive from shift_jis often use the shift_jis name instead of a more specific vendor charset name. Windows-31J is one example, MacJapanese and Java SJIS are others. A common variation is to convert shift_jis 0x5c to U+005c Unicode, but display it as the Yen sign. Windows-31J examples.
Person & email address to contact for further information:
Japanese Industrial Standards Committee
http://www.jisc.go.jp/eng/index.html
Intended usage: LIMITED USE