[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: shift_jis / windows-31J
Yup. And that probably solves the concerns for HTML.
I'm being asked to document things like a character set selector attribute that's a byte. It has entries like:
0x80 Specifies the JIS character set. (IANA name shift_jis)
We all know that "Microsoft's" shift_jis is really Windows-31J, but the on-the-surface reasonable request to replace this shift_jis with Windows-31J would mean that we'd be specifying an identifier that our software didn't recognize. That doesn't help solve the problem. Even when we do recognize windows-31J, we'd tell you that the name was shift_jis (round tripping.)
This kind of documentation shows up "everywhere", so it'd be nice if people got to shift_jis in the registry and saw "gee, Microsoft uses a variation".
At this point it's rather a mess, and the behavior's pretty stuck. If it is desirable for the registry to point people in the right direction, then doing something like what HTML did, at the registry level, would be most helpful.
Shift-JIS currently says:
Name: Shift_JIS (preferred MIME name)
MIBenum: 17
Source: This charset is an extension of csHalfWidthKatakana by
adding graphic characters in JIS X 0208. The CCS's are
JIS X0201:1997 and JIS X0208:1997. The
complete definition is shown in Appendix 1 of JIS
X0208:1997.
This charset can be used for the top-level media type "text".
Alias: MS_Kanji
Alias: csShiftJIS
I'd be happy with some sort of "Microsoft has a variant note". Adding a sentence at the end:
Name: Shift_JIS (preferred MIME name)
MIBenum: 17
Source: This charset is an extension of csHalfWidthKatakana by
adding graphic characters in JIS X 0208. The CCS's are
JIS X0201:1997 and JIS X0208:1997. The
complete definition is shown in Appendix 1 of JIS
X0208:1997.
This charset can be used for the top-level media type "text".
Microsoft products often use the shift_jis name to describe
the Windows-31J variant.
Alias: MS_Kanji
Alias: csShiftJIS
Maybe the opposite in the windows-31j entry "Microsoft products often use shift_jis to describe this variant."
-Shawn
-----Original Message-----
From: NARUSE, Yui [mailto:naruse@airemix.jp]
Sent: Thursday, November 11, 2010 10:11 AM
To: Shawn Steele
Cc: "Martin J. Dürst"; ietf-charsets@mail.apps.ietf.org
Subject: Re: shift_jis / windows-31J
(2010/11/12 2:31), Shawn Steele wrote:
>> Moreover XML doesn't allow "+" for EncName.
>> http://www.w3.org/TR/REC-xml/#NT-EncName
>
> I picked the syntax based on a previous thread a couple years ago, I
> didn't realize this was a problem.
>
> My more general question is "how do I say 'shift_jis points to
> windows-31J' on some systems, despite what the charset registry has
> said for years?"
In HTML5's case, they introduce "Character encoding overrides".
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#character-encodings-0
--
NARUSE, Yui <naruse@airemix.jp>