[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: shift_jis / windows-31J



Hello Makoto,

Many thanks for your help. Some comments below:

On 2010/11/12 8:57, MURATA Makoto wrote:
>
>> I'm being asked to document things like a character set selector attribute that's a byte.  It has entries like:
>> 	0x80 Specifies the JIS character set. (IANA name shift_jis)
>>
>> We all know that "Microsoft's" shift_jis is really Windows-31J, but the on-the-surface reasonable
>> request to replace this shift_jis with Windows-31J would mean that we'd
>> be specifying an identifier that our software didn't recognize.  That
>> doesn't help solve the problem.  Even when we do recognize windows-31J,
>> we'd tell you that the name was shift_jis (round tripping.)
>
> I do not think so, since you specify 0x80 in data rather than
> "windows-31J" or "shift_jis" in this particular case.
>
>> This kind of documentation shows up "everywhere", so it'd be nice if people got to shift_jis in the
>> registry and saw "gee, Microsoft uses a variation".
>>
>> At this point it's rather a mess, and the behavior's pretty stuck.  If it is desirable for the registry
>> to point people in the right direction, then doing something like what
>> HTML did, at the registry level, would be most helpful.
>
> I agree with this idea.  Except the addition of a new alias, I agree.
>
> I reformulated your proposal using the latest registration template.
> Here goes.
>
> --------------------------------------------------------------------------------
>
>
> Charset name: Windows-31J
> Charset aliases: csWindows31J
> MIBenum: 2024
>
> Suitability for use in MIME text:
>
> This charset can be used for the top-level media type "text".

The recent windows-874 registration used the following:

 >>>>
Suitability for use in MIME text:

Yes, windows-874 is suitable for use with subtypes of the "text"
Content-Type. Note that windows-874 is an 8-bit charset. Care should
be taken to choose an appropriate Content-Transfer-Encoding.
 >>>>

I suggest you use something similar, in particular also mentioning that 
this is an 8-bit charset. I'm sure Ned would insist on that if he had time.

> Published specification(s):
>
> http://msdn.microsoft.com/en-us/goglobal/cc305152.aspx
>
> ISO 10646 equivalency table:
>
> http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
>
> Additional information:
>
> Windows Japanese.  A variant of Shift_JIS to include NEC special
> characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92),
> and IBM extensions (Rows 115 to 119).  The CCS's are JIS X0201:1997,
> JIS X0208:1997, and these extensions.  Windows-31J text is commonly
> declared with the shift_jis name of the parent charset.

I would probably change this to "On Windows systems, Windows-31J text is 
commonly declared...."

> Person&  email address to contact for further information: ??
>
> Intended usage: LIMITED USE
>
> --------------------------------------------------------------------------------
>
> Charset name: Shift_JIS
>
> MIBenum: 17
>
> Charset aliases: MS_Kanji and csShiftJIS

The "MS_Kanji" alias is really quite unfortunate, but I don't think we 
can remove it.

> Suitability for use in MIME text:
> This charset can be used for the top-level media type "text".
>
> Published specification(s): Appendix 1 of JIS X0208:1997.
>
> ISO 10646 equivalency table:
>
> There are no authoritative definitions and several variations
> exist.  An obsolete variation is available at:
>
> http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/SHIFTJIS.TXT

This is in my opinion somewhat too pessimistic. Essentially, the 
correspondence is defined in JIS X0208:1997. All the Kanji are mapped as 
described in Appendix 6. There is some variation for some of the 
punctuation listed in the first column of Table 2 of Appendix 5; 
otherwise, the names given in Appendix 5 are used in preference to those 
given in Appendix 4, where available.

If the above is true, it might be better to write it in that way, rather 
than just to imply that anything goes.

Regards,    Martin.

> Additional information:
>
> This charset is an extension of csHalfWidthKatakana by adding graphic
> characters in JIS X 0208.  The CCS's are JIS X0201:1997 and JIS
> X0208:1997.
>
> Several vendor specific charsets that derive from shift_jis often use
> the shift_jis name instead of a more specific vendor charset name.
>
> Person&  email address to contact for further information: ?
>
> Intended usage: LIMITED USE
>
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp