[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Registration of new charset CP51932
Hi,
(2010/04/06 1:49), Ira McDonald wrote:
> Still missing in this registration is one charset alias of the form
> "csXxxx", where "Xxxx" is usually the primary name of the charset,
> e.g., "csCP51932" in this case.
I see, I added it.
> Section 2.3 on page 4 of IANA Charset Registration Procedures
> (RFC 2978 / BCP 19) says:
>
> "All charsets MUST be assigned a name that provides a display string
> for the associated "MIBenum" value defined below. These "MIBenum"
> values are defined by and used in the Printer MIB [RFC-1759]. Such
> names MUST begin with the letters "cs" and MUST contain no more than
> 40 characters (including the "cs" prefix) chosen from from the
> printable subset of US-ASCII. Only one name beginning with "cs" may
> be assigned to a single charset. If no name of this form is
> explicitly defined IANA will assign an alias consisting of "cs"
> prepended to the primary charset name."
>
> In Printer MIB v2 (RFC 3805), these "csXxxx" aliases were moved out
> of the Printer MIB and into the IANA Charset MIB (RFC 3808).
The proposal is now following.
Thanks,
----------
Charset name: CP51932
Charset aliases: csCP51932
Suitability for use in MIME text:
Yes, CP51932 is suitable for use with subtypes of the "text"
Content-Type. Since CP1932 is an 8bit charset Care should be
taken to choose an appropriate Content-Transfer-Encoding.
Published specification(s):
Octets with the high bit clear specify single US-ASCII characters, while
octets with the high bit set encode characters from the Windows Codepage
932 by combining the bits from the two octets except the first octet is
0x8E which represents Halfwidth Katakana.
Meaning and mapping to Unicode of each character is refer to
Windows Codepage 932.
http://msdn.microsoft.com/en-us/goglobal/cc305152.aspx
ISO 10646 equivalency table:
http://cpansearch.perl.org/src/NARUSE/Encode-EUCJPMS-0.07/ucm/cp51932.ucm
Additional information:
This is a request for a new registration of this charset.
CP51932 is a variant of EUC-JP (like Windows-31J and Shift_JIS).
this charset is different from EUC-JP in:
* CP51932 doesn't support JIS X 0212
* CP51932 supports characters extended by Windows Codepage 932
* Unicode mapping of some characters are different
Typical user of CP51932 is web browsers. When web browsers load
a page which are declared or auto-detected as "EUC-JP", they don't
interpret it as true EUC-JP registerd in IANA Character Sets but as
CP51932. When they post form data as "EUC-JP", the data is also
encoded as CP51932.
The name "CP51932" is in use following applications:
* Citrus iconv (NetBSD and DragonFly uses this)
* patched GNU libiconv in FreeBSD ports
* Mojikan http://www.mirai-ii.co.jp/moji/mojikan/
* nkf 2.0.5
* PHP 5.2.1
* Ruby 1.9.1
* Encode-EUCJPMS-0.06
Moreover applications which uses MLang.DLL or .NET Framework for
converting "EUC-JP" implicitly uses this charset.
So this charset is widely used, but doesn't have its own name.
Intended use of this name is to override the implementation of EUC-JP
or charset convertion.
http://wiki.whatwg.org/wiki/Web_Encodings
http://www.w3.org/Bugs/Public/show_bug.cgi?id=7444
Why the name is not "Windows-51932" is some of applications which accept
the name "CP51932" don't support the name "Windows-51932".
CP51932 is for use of importing legacy data.
UTF-8 is preferred to CP51932 for new system.
Related references are:
"Remarks" of "GetEncodings Method" of "System.Text"
http://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings.aspx
"UnicodeによるJIS X0213実装入門―情報システムの新たな日本語処理環境"
日経BPソフトプレス, ISBN 978-4891006082, 2008, p. 17-18, 20, 120-158
CP51932 - Legacy Encoding Project
http://legacy-encoding.sourceforge.jp/wiki/index.php?cp51932
This charset is also known as Windows Codepage 51932.
Person & email address to contact for further information:
NARUSE, Yui
Email: naruse@airemix.jp
Intended usage: LIMITED USE
--
NARUSE, Yui <naruse@airemix.jp>