[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Clarification of existing charsets



1. Summary

I propose to clarify the registration of "Shift_JIS" and
"Windows-31J".  The coded character sets of "Shift_JIS" should
be JIS X 0201:1997 and JIS X 0208:1997, and appendix 1 of JIS
X0208:1997 should be explicitly referenced.  The coded
character sets of "Windows-31J" further contain NEC special
characters (Row 13), NEC selection of IBM extensions (Rows 89
to 92), and IBM extensions (Rows 115 to 119).  Code Page 932
should be explicitly referenced.

Charset name(s):

	Shift_JIS (MS_Kanji,csShiftJIS)

	Windows-31J (csWindows31J)

Published specification(s):

	Shift_JIS:	JIS X0208:1997

	Windows-31J:	Microsoft Code Page 932
	(ftp://ftp.unicode.org/Public/MAPPINGS/
		VENDORS/MICSFT/WINDOWS/CP932.TXT)

Person & email address to contact for further information:

	MURATA Makoto (murata@fxis.fujixerox.co.jp)


2. Background

In 1982, a Japanese comany, "ASCII" invented Shift JIS.  It
was first used for MBASICplus, which was a variation of
MS-BASIC.  The software platform of MBASICplus was CP/M-86
and the hardware platform was MULTI-16 of Mitsubishi.  The
coded character set of Shift JIS was JIS X0201 + JIS
X0208:1978 (formerly called JIS C6226:1978).  In 1983, ASCII,
Mitsubishi, Japan IBM, and Microsoft agreed to use Shift JIS
for internal representation of Japanese text on top of
personal computers.  Later, many companies (e.g., NEC, Apple,
DEC, and IBM) have adopted Shift JIS as a basis, but
developed their own variations by introducing aditional
characters.  In 1997, JIS X 0208 standardized Shift JIS in
its appendix 1, where it is clearly stated that the coded
character set is JIS X0201 + JIS X0208:1997.

The Unicode Consortium publishes a mapping table between
Shift JIS and Unicode 1.1.  The URL is:
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/JIS/SHIFTJIS.TXT.
Again, the coded character set of Shift JIS in this mapping
table is JIS X0201 + JIS X0208.

Meanwhile, Microsoft developed a variation of Shift JIS (CP932).  
This variation contains NEC special characters (Row 13), NEC selection 
of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119).  
A mapping table between CP932 and Unicode is also available from the 
Unicode Consoritum.  The URL is:
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT

Many other companies have their own variation of Shift JIS.  For
example, the variation of Apple is available at:
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT
The coded character sets of this variation have more than 300
extended characters, which are not compabtible with the variation 
of Microsoft.  For information about other variations,
see http://www.opengroup.or.jp/jvc/cde/sjis-e.html.


3. Current registration

Charset "Shift_JIS" is registered as follows:

>Name: Shift_JIS  (preferred MIME name)
>MIBenum: 17
>Source: A Microsoft code that extends csHalfWidthKatakana to include 
>       kanji by adding a second byte when the value of the first 
>       byte is in the ranges 81-9F or E0-EF.
>Alias: MS_Kanji 
>Alias: csShiftJIS

where csHalfWidthKatakana is registered as follows:

>Name: JIS_X0201                                           [RFC1345,KXS2]
>MIBenum: 15
>Source: JIS X 0201-1976.   One byte only, this is equivalent to 
>       JIS/Roman (similar to ASCII) plus eight-bit half-width
>        Katakana
>Alias: X0201
>Alias: csHalfWidthKatakana

Observe that "kanji" in the registration of "Shift_JIS" is unclear.  
(It could be JIS X0208, JIS X0212, Big 5, or whatever.)

However, another charset, "Windows-31J", is registered as
follows:

>Name: Windows-31J
>MIBenum: 2024
>Source: Windows Japanese.  A further extension of csShiftJIS
>       to include several OEM-specific kanji extensions.  
>       Like csShiftJIS, it adds a second byte when the value 
>       of the first byte is in the ranges 81-9F or E0-EF.
>       PCL Symbol Set id: 19K
>Alias: csWindows31J

Clearly, "Windows-31J" is different from "Shift_JIS" and the
difference is "OEM-specific kanji extensions".

To me, the only reasonable interpretation of "OEM-specific
kanji extensions" are NEC special characters, NEC selection
of IBM extensions, and IBM extenions.  Thus, "kanji" in the
registration of "Shift_JIS" should read JIS X0208 graphic 
characters.


4. Proposed revision

I believe that the CCS's of the MIME charset "Shift_JIS" should
be JIS X0201 and JIS X0208.  Since every vendor has its own
variation of Shift JIS, we cannot adopt such a variation as
the definition of "Shift_JIS".  Rather, vendor-specfic
extensions should be registered as separate charsets, if
necessary.

Here is my revision proposal.
 
Name: Shift_JIS  (preferred MIME name)
MIBenum: 17
Source: This charset is an extension of csHalfWidthKatakana by 
	adding graphic characters in JIS X 0208.  The CCS's are
	JIS X0201:1997 and JIS X0208:1997.  The
	complete definition is shown in Appendix 1 of JIS
	X0208:1997.
        This charset can be used for the top-level media type "text".
Alias: MS_Kanji 
Alias: csShiftJIS


Name: Windows-31J
MIBenum: 2024
Source: Windows Japanese.  A further extension of csShiftJIS
        to include NEC special characters (Row 13),
	NEC selection of IBM extensions (Rows 89 to 92), and IBM
	extensions (Rows 115 to 119).  The CCS's are
	JIS X0201:1997, JIS X0207:1998, and these extensions.
        This charset can be used for the top-level media type "text".
        PCL Symbol Set id: 19K
Alias: csWindows31J


Makoto
 
Fuji Xerox Information Systems
 
Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp