[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registration of new charset CP51932



It is one month from last main of this thread.
I think it is enough time for this topic.

Or someone has any suggestion?

(2010/04/07 5:52), NARUSE Yui wrote:
> Hi,
>
> (2010/04/06 1:49), Ira McDonald wrote:
>> Still missing in this registration is one charset alias of the form
>> "csXxxx", where "Xxxx" is usually the primary name of the charset,
>> e.g., "csCP51932" in this case.
>
> I see, I added it.
>
>> Section 2.3 on page 4 of IANA Charset Registration Procedures
>> (RFC 2978 / BCP 19) says:
>>
>> "All charsets MUST be assigned a name that provides a display string
>> for the associated "MIBenum" value defined below. These "MIBenum"
>> values are defined by and used in the Printer MIB [RFC-1759]. Such
>> names MUST begin with the letters "cs" and MUST contain no more than
>> 40 characters (including the "cs" prefix) chosen from from the
>> printable subset of US-ASCII. Only one name beginning with "cs" may
>> be assigned to a single charset. If no name of this form is
>> explicitly defined IANA will assign an alias consisting of "cs"
>> prepended to the primary charset name."
>>
>> In Printer MIB v2 (RFC 3805), these "csXxxx" aliases were moved out
>> of the Printer MIB and into the IANA Charset MIB (RFC 3808).
>
> The proposal is now following.
>
> Thanks,
>
> ----------
> Charset name: CP51932
>
> Charset aliases: csCP51932
>
> Suitability for use in MIME text:
>
> Yes, CP51932 is suitable for use with subtypes of the "text"
> Content-Type. Since CP1932 is an 8bit charset Care should be
> taken to choose an appropriate Content-Transfer-Encoding.
>
> Published specification(s):
>
> Octets with the high bit clear specify single US-ASCII characters, while
> octets with the high bit set encode characters from the Windows Codepage
> 932 by combining the bits from the two octets except the first octet is
> 0x8E which represents Halfwidth Katakana.
>
> Meaning and mapping to Unicode of each character is refer to
> Windows Codepage 932.
> http://msdn.microsoft.com/en-us/goglobal/cc305152.aspx
>
> ISO 10646 equivalency table:
>
> http://cpansearch.perl.org/src/NARUSE/Encode-EUCJPMS-0.07/ucm/cp51932.ucm
>
> Additional information:
>
> This is a request for a new registration of this charset.
>
> CP51932 is a variant of EUC-JP (like Windows-31J and Shift_JIS).
> this charset is different from EUC-JP in:
> * CP51932 doesn't support JIS X 0212
> * CP51932 supports characters extended by Windows Codepage 932
> * Unicode mapping of some characters are different
>
> Typical user of CP51932 is web browsers. When web browsers load
> a page which are declared or auto-detected as "EUC-JP", they don't
> interpret it as true EUC-JP registerd in IANA Character Sets but as
> CP51932. When they post form data as "EUC-JP", the data is also
> encoded as CP51932.
>
> The name "CP51932" is in use following applications:
> * Citrus iconv (NetBSD and DragonFly uses this)
> * patched GNU libiconv in FreeBSD ports
> * Mojikan http://www.mirai-ii.co.jp/moji/mojikan/
> * nkf 2.0.5
> * PHP 5.2.1
> * Ruby 1.9.1
> * Encode-EUCJPMS-0.06
>
> Moreover applications which uses MLang.DLL or .NET Framework for
> converting "EUC-JP" implicitly uses this charset.
>
> So this charset is widely used, but doesn't have its own name.
> Intended use of this name is to override the implementation of EUC-JP
> or charset convertion.
> http://wiki.whatwg.org/wiki/Web_Encodings
> http://www.w3.org/Bugs/Public/show_bug.cgi?id=7444
>
> Why the name is not "Windows-51932" is some of applications which accept
> the name "CP51932" don't support the name "Windows-51932".
>
> CP51932 is for use of importing legacy data.
> UTF-8 is preferred to CP51932 for new system.
>
> Related references are:
>
> "Remarks" of "GetEncodings Method" of "System.Text"
> http://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings.aspx
>
>
> "UnicodeによるJIS X0213実装入門―情報システムの新たな日本語処理環境"
> 日経BPソフトプレス, ISBN 978-4891006082, 2008, p. 17-18, 20, 120-158
>
> CP51932 - Legacy Encoding Project
> http://legacy-encoding.sourceforge.jp/wiki/index.php?cp51932
>
> This charset is also known as Windows Codepage 51932.
>
> Person & email address to contact for further information:
>
> NARUSE, Yui
> Email: naruse@airemix.jp
>
> Intended usage: LIMITED USE
>


-- 
NARUSE, Yui  <naruse@airemix.jp>