[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registration of new charset CP50220



(2010/04/22 1:21), Masatoshi Kimura wrote:
>>     13    JIS X 0201-Katakana ESC ) I       G1
> Is there any implementaions which actually support this sequence?
> At least Mozilla implementation doesn't. It doesn't support SI/SO either.

Internet Explorer and Opera support 8-bit Katakana.
Internet Explorer supports Shift-in Katakana.
http://coq.no/character-tables/mime/iso-2022/en

>>     Another typical user is Japanese IRC network. They sometimes send
>>     JIS X 0201-Katakana encoded in GR (JIS8).
> What does it mean? You said this charset is 7-bit encoding.
>
>>     Since the "CP50220" is 7bit encoding, Content-Transfer-Encoding is
>>     not needed.
>>     Based64 or Quoted-Printable encoding MAY break this encoding.
>>   This charset is ISO/IEC 2022 family.
> For these reasons, I don't think it's appropriate to treat this charset
> as an ISO/IEC 2022 family.

Hmm you are right, I removed 8-bit and Shift-in Katakana.


And I added following line:
     * CP50220 supports JIS X 0201-Katakana
     * CP50220 supports characters extended by Windows Codepage 932
+      (NEC special characters and NEC selection of IBM extensions)
     * Unicode mapping of some characters are different

----------
Charset name: CP50220

Charset aliases: csCP50220

Suitability for use in MIME text:

  Yes, CP50220 is suitable for use with subtypes of the "text" Content-Type.

  Since the "CP50220" is 7bit encoding, Content-Transfer-Encoding is not needed.
  Based64 or Quoted-Printable encoding MAY break this encoding.

Published specification(s):

  CP50220 is consisted by following character sets:

    reg#  character set       ESC sequence  designated to
    ------------------------------------------------------
    6     US-ASCII            ESC ( B       G0
    13    JIS X 0201-Katakana ESC ( I       G0
    14    JIS X 0201-Roman    ESC ( J       G0
    42    JIS X 0208-1978     ESC $ @       G0
    87    JIS X 0208-1983     ESC $ B       G0

  * The beggining of a text is assumed to have "ESC ( B ESC ) I".
  * Each line of CP50220 text MUST end with ASCII.
  * On receiving JIS X 0201-Katakana characters MAY be encoded
    with the escape sequence: ESC ( I.
  * On sending JIS X 0201-Katakana, it MUST be converted to related
    character of JIS X 0208.
  * The character set of CP50220 is based on Windows Codepage 932.
    So a meaning and a map to Unicode of each character is refer to it.
    http://msdn.microsoft.com/en-us/goglobal/cc305152.aspx

ISO 10646 equivalency table:

  This charset is ISO/IEC 2022 family.
  Conversion of each character refers Windows Codepage 932:
  http://msdn.microsoft.com/en-us/goglobal/cc305152.aspx
  http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
  http://icu-project.org/repos/icu/data/trunk/charset/data/ucm/windows-932-2000.ucm

Additional information:

  This is a request for a new registration of this charset.

  CP50220 is a variant of ISO-2022-JP (like Windows-31J and Shift_JIS).
  this charset is different from ISO-2022-JP in:
    * CP50220 supports JIS X 0201-Katakana
    * CP50220 supports characters extended by Windows Codepage 932
      (NEC special characters and NEC selection of IBM extensions)
    * Unicode mapping of some characters are different

  Typical user of CP50220 is web browsers. When web browsers load
  a page which are declared or auto-detected as "ISO-2022-JP", they
  don't interpret it as true ISO-2022-JP registerd in IANA Character
  Sets but as CP50220. When they post form data as "ISO-2022-JP",
  the data is also encoded as CP50220. Note that though csISO2022JP
  is alias of ISO-2022-JP in IANA Character Sets, on Windows it means
  neither registered ISO-2022-JP nor CP50220 but means CP50221.

  The name "CP50220" is in use following applications:
    * Citrus iconv (NetBSD and DragonFly uses this)
    * Mojikan http://www.mirai-ii.co.jp/moji/mojikan/
    * nkf 2.0.5
    * Encode-EUCJPMS-0.06

  Moreover applications which uses MLang.DLL or .NET Framework for
  converting "ISO-2022-JP" implicitly uses this charset.

  So this charset is widely used, but doesn't have its own name.

  Why the name is not "Windows-50220" is some of applications which accept
  the name "CP50220" don't support the name "Windows-50220".

  CP50220 is for use of communicating with legacy system.
  UTF-8 is preferred to CP50220 for new system.

  Related references are:

    "Remarks" of "GetEncodings Method" of "System.Text"
    http://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings.aspx

    "UnicodeによるJIS X0213実装入門―情報システムの新たな日本語処理環境"
    日経BPソフトプレス, ISBN 978-4891006082, 2008, p. 17-18, 20, 120-158

    CP50220 - Legacy Encoding Project
    http://legacy-encoding.sourceforge.jp/wiki/index.php?cp50220

  This charset is also known as Windows Codepage 50220.

Person & email address to contact for further information:

  NARUSE, Yui
  Email: naruse@ruby-lang.org

Intended usage: LIMITED USE

-- 
NARUSE, Yui
naruse@airemix.jp