[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Registration of new charset CP51932



Hello,

This is Registration of new charset CP51932.

Following proposal should get a consensus in ietf-charsets,
so I submit this to IANA.

Regards,

---
Charset name: CP51932

Charset aliases: csCP51932

Suitability for use in MIME text:

    Yes, CP51932 is suitable for use with subtypes of the "text"
    Content-Type. Since CP1932 is an 8bit charset care should be
   taken to choose an appropriate Content-Transfer-Encoding.

Published specification(s):

    Octets with the high bit clear specify single US-ASCII characters, while
    octets with the high bit set encode characters from the Windows Codepage
    932 by combining the bits from the two octets except when the first octet
    is 0x8E, in which case this represents Halfwidth Katakana.

    Meaning and mapping to Unicode of each character is refer to
    Windows Codepage 932.
    http://msdn.microsoft.com/en-us/goglobal/cc305152.aspx

ISO 10646 equivalency table:

    http://cpansearch.perl.org/src/NARUSE/Encode-EUCJPMS-0.07/ucm/cp51932.ucm

Additional information:

    CP51932 is a variant of EUC-JP (like Windows-31J and Shift_JIS).
    this charset is different from EUC-JP in:
      * CP51932 doesn't support JIS X 0212
      * CP51932 supports characters extended by Windows Codepage 932
      * Unicode mapping of some characters are different

    Typical user of CP51932 is web browsers. When web browsers load
    a page which are declared or auto-detected as "EUC-JP", they don't
    interpret it as true EUC-JP registerd in IANA Character Sets but as
    CP51932. When they post form data as "EUC-JP", the data is also
    encoded as CP51932.

    The name "CP51932" is in use following applications:
      * Citrus iconv (NetBSD and DragonFly uses this)
      * patched GNU libiconv in FreeBSD ports
      * Mojikan http://www.mirai-ii.co.jp/moji/mojikan/
      * nkf 2.0.5
      * PHP 5.2.1
      * Ruby 1.9.1
      * Encode-EUCJPMS-0.06

    Moreover applications which uses MLang.DLL or .NET Framework for
    converting "EUC-JP" implicitly uses this charset.

    Intended use of this name is to override the implementation of EUC-JP
    or charset convertion.
    http://wiki.whatwg.org/wiki/Web_Encodings
    http://www.w3.org/Bugs/Public/show_bug.cgi?id=7444

    Why the name is not "Windows-51932" is some of applications which accept
    the name "CP51932" don't support the name "Windows-51932".

    CP51932 is for use of importing legacy data.
    UTF-8 is preferred to CP51932 for new system.

    Related references are:

      "Remarks" of "GetEncodings Method" of "System.Text"
      http://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings.aspx

      "Introduction to JIS X0213 Implementation based on Unicode -
       A new Japanese Language Processing Environment for Information Systems",
       Nikkei BP Soft Press, ISBN 978-4891006082, 2008, pp. 17-18, 20, 120-158 (in Japanese)

      CP51932 - Legacy Encoding Project
      http://legacy-encoding.sourceforge.jp/wiki/index.php?cp51932

    This charset is also known as Windows Codepage 51932.

Person & email address to contact for further information:

    NARUSE, Yui
    Email: naruse@airemix.jp

Intended usage: LIMITED USE

-- 
NARUSE, Yui  <naruse@airemix.jp>