[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: shift_jis / windows-31J



Yup.  And that probably solves the concerns for HTML.

I'm being asked to document things like a character set selector attribute that's a byte.  It has entries like:
	0x80 Specifies the JIS character set. (IANA name shift_jis)

We all know that "Microsoft's" shift_jis is really Windows-31J, but the on-the-surface reasonable request to replace this shift_jis with Windows-31J would mean that we'd be specifying an identifier that our software didn't recognize.  That doesn't help solve the problem.  Even when we do recognize windows-31J, we'd tell you that the name was shift_jis (round tripping.)  

This kind of documentation shows up "everywhere", so it'd be nice if people got to shift_jis in the registry and saw "gee, Microsoft uses a variation".

At this point it's rather a mess, and the behavior's pretty stuck.  If it is desirable for the registry to point people in the right direction, then doing something like what HTML did, at the registry level, would be most helpful.

Shift-JIS currently says:

Name: Shift_JIS  (preferred MIME name)
MIBenum: 17
Source: This charset is an extension of csHalfWidthKatakana by
        adding graphic characters in JIS X 0208.  The CCS's are
        JIS X0201:1997 and JIS X0208:1997.  The
        complete definition is shown in Appendix 1 of JIS
        X0208:1997.
        This charset can be used for the top-level media type "text".
Alias: MS_Kanji 
Alias: csShiftJIS

I'd be happy with some sort of "Microsoft has a variant note".  Adding a sentence at the end:

Name: Shift_JIS  (preferred MIME name)
MIBenum: 17
Source: This charset is an extension of csHalfWidthKatakana by
        adding graphic characters in JIS X 0208.  The CCS's are
        JIS X0201:1997 and JIS X0208:1997.  The
        complete definition is shown in Appendix 1 of JIS
        X0208:1997.
        This charset can be used for the top-level media type "text".
        Microsoft products often use the shift_jis name to describe
        the Windows-31J variant.
Alias: MS_Kanji 
Alias: csShiftJIS

Maybe the opposite in the windows-31j entry "Microsoft products often use shift_jis to describe this variant."

-Shawn

-----Original Message-----
From: NARUSE, Yui [mailto:naruse@airemix.jp] 
Sent: Thursday, November 11, 2010 10:11 AM
To: Shawn Steele
Cc: "Martin J. Dürst"; ietf-charsets@mail.apps.ietf.org
Subject: Re: shift_jis / windows-31J

(2010/11/12 2:31), Shawn Steele wrote:
>> Moreover XML doesn't allow "+" for EncName.
>> http://www.w3.org/TR/REC-xml/#NT-EncName
>
> I picked the syntax based on a previous thread a couple years ago, I 
> didn't realize this was a problem.
>
> My more general question is "how do I say 'shift_jis points to 
> windows-31J' on some systems, despite what the charset registry has 
> said for years?"

In HTML5's case, they introduce "Character encoding overrides".
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#character-encodings-0

--
NARUSE, Yui  <naruse@airemix.jp>