[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registration of new charset CP51932



Hi,

I think, this topic has enough discussed.
So I want to go next stage.

Thanks,

(2010/05/16 20:10), NARUSE Yui wrote:
> It is one month from last main of this thread.
> I think it is enough time for this topic.
>
> Or someone has any suggestion?
>
> (2010/04/07 5:52), NARUSE Yui wrote:
>> Hi,
>>
>> (2010/04/06 1:49), Ira McDonald wrote:
>>> Still missing in this registration is one charset alias of the form
>>> "csXxxx", where "Xxxx" is usually the primary name of the charset,
>>> e.g., "csCP51932" in this case.
>>
>> I see, I added it.
>>
>>> Section 2.3 on page 4 of IANA Charset Registration Procedures
>>> (RFC 2978 / BCP 19) says:
>>>
>>> "All charsets MUST be assigned a name that provides a display string
>>> for the associated "MIBenum" value defined below. These "MIBenum"
>>> values are defined by and used in the Printer MIB [RFC-1759]. Such
>>> names MUST begin with the letters "cs" and MUST contain no more than
>>> 40 characters (including the "cs" prefix) chosen from from the
>>> printable subset of US-ASCII. Only one name beginning with "cs" may
>>> be assigned to a single charset. If no name of this form is
>>> explicitly defined IANA will assign an alias consisting of "cs"
>>> prepended to the primary charset name."
>>>
>>> In Printer MIB v2 (RFC 3805), these "csXxxx" aliases were moved out
>>> of the Printer MIB and into the IANA Charset MIB (RFC 3808).
>>
>> The proposal is now following.
>>
>> Thanks,
>>
>> ----------
>> Charset name: CP51932
>>
>> Charset aliases: csCP51932
>>
>> Suitability for use in MIME text:
>>
>> Yes, CP51932 is suitable for use with subtypes of the "text"
>> Content-Type. Since CP1932 is an 8bit charset Care should be
>> taken to choose an appropriate Content-Transfer-Encoding.
>>
>> Published specification(s):
>>
>> Octets with the high bit clear specify single US-ASCII characters, while
>> octets with the high bit set encode characters from the Windows Codepage
>> 932 by combining the bits from the two octets except the first octet is
>> 0x8E which represents Halfwidth Katakana.
>>
>> Meaning and mapping to Unicode of each character is refer to
>> Windows Codepage 932.
>> http://msdn.microsoft.com/en-us/goglobal/cc305152.aspx
>>
>> ISO 10646 equivalency table:
>>
>> http://cpansearch.perl.org/src/NARUSE/Encode-EUCJPMS-0.07/ucm/cp51932.ucm
>>
>> Additional information:
>>
>> This is a request for a new registration of this charset.
>>
>> CP51932 is a variant of EUC-JP (like Windows-31J and Shift_JIS).
>> this charset is different from EUC-JP in:
>> * CP51932 doesn't support JIS X 0212
>> * CP51932 supports characters extended by Windows Codepage 932
>> * Unicode mapping of some characters are different
>>
>> Typical user of CP51932 is web browsers. When web browsers load
>> a page which are declared or auto-detected as "EUC-JP", they don't
>> interpret it as true EUC-JP registerd in IANA Character Sets but as
>> CP51932. When they post form data as "EUC-JP", the data is also
>> encoded as CP51932.
>>
>> The name "CP51932" is in use following applications:
>> * Citrus iconv (NetBSD and DragonFly uses this)
>> * patched GNU libiconv in FreeBSD ports
>> * Mojikan http://www.mirai-ii.co.jp/moji/mojikan/
>> * nkf 2.0.5
>> * PHP 5.2.1
>> * Ruby 1.9.1
>> * Encode-EUCJPMS-0.06
>>
>> Moreover applications which uses MLang.DLL or .NET Framework for
>> converting "EUC-JP" implicitly uses this charset.
>>
>> So this charset is widely used, but doesn't have its own name.
>> Intended use of this name is to override the implementation of EUC-JP
>> or charset convertion.
>> http://wiki.whatwg.org/wiki/Web_Encodings
>> http://www.w3.org/Bugs/Public/show_bug.cgi?id=7444
>>
>> Why the name is not "Windows-51932" is some of applications which accept
>> the name "CP51932" don't support the name "Windows-51932".
>>
>> CP51932 is for use of importing legacy data.
>> UTF-8 is preferred to CP51932 for new system.
>>
>> Related references are:
>>
>> "Remarks" of "GetEncodings Method" of "System.Text"
>> http://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings.aspx
>>
>>
>> "UnicodeによるJIS X0213実装入門―情報システムの新たな日本語処理環境"
>> 日経BPソフトプレス, ISBN 978-4891006082, 2008, p. 17-18, 20, 120-158
>>
>> CP51932 - Legacy Encoding Project
>> http://legacy-encoding.sourceforge.jp/wiki/index.php?cp51932
>>
>> This charset is also known as Windows Codepage 51932.
>>
>> Person & email address to contact for further information:
>>
>> NARUSE, Yui
>> Email: naruse@airemix.jp
>>
>> Intended usage: LIMITED USE
>>
>
>


-- 
NARUSE, Yui  <naruse@airemix.jp>