[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registration of new charset CP51932



Hello Yui,

On 2010/08/27 15:11, NARUSE, Yui wrote:
> Hi,
>
> I think, this topic has enough discussed.
> So I want to go next stage.

Many thanks for the reminder. RFC 2978 
(http://tools.ietf.org/html/rfc2978) says:

 >>>>
3.2.  Charset Reviewer

    When the two week period has passed and the registration proposer is
    convinced that consensus has been achieved, the registration
    application should be submitted to IANA and the charset reviewer.
    The charset reviewer, who is appointed by the IETF Applications Area
    Director(s), either approves the request for registration or rejects
    it. ...
 >>>>

Because the section is entitled "Charset Reviewer", it's a bit difficult 
to find and understand that the idea is that the *proposer* sends the 
proposal to IANA. And it's also not clear what "sends the proposal to 
IANA" means.

So please send your proposal to iana@iana.org, copying Ned and me (and 
ietf-charsets@iana.org if you want). At the start of your mail, please 
say that this is a proposal for registration of a charset.
Please remove the leading ">>>" (or however the quotation shows in your 
  mailer). And please make the changes below (mostly related to the fact 
that when this gets registered, some of the current statements will no 
longer be true).

Regards,    Martin.


>> (2010/04/07 5:52), NARUSE Yui wrote:

>>> ----------
>>> Charset name: CP51932
>>>
>>> Charset aliases: csCP51932
>>>
>>> Suitability for use in MIME text:
>>>
>>> Yes, CP51932 is suitable for use with subtypes of the "text"
>>> Content-Type. Since CP1932 is an 8bit charset Care should be
>>> taken to choose an appropriate Content-Transfer-Encoding.

Please change "charset Care" to "charset, care".

>>> Published specification(s):
>>>
>>> Octets with the high bit clear specify single US-ASCII characters, while
>>> octets with the high bit set encode characters from the Windows Codepage
>>> 932 by combining the bits from the two octets except the first octet is
>>> 0x8E which represents Halfwidth Katakana.

Please change "two octets except the first octet is 0x8E which 
represents Halfwidth Katakana" to "two octets except when the first 
octet is 0x8E, in which case this represents Halfwidth Katakana".

>>> Meaning and mapping to Unicode of each character is refer to
>>> Windows Codepage 932.
>>> http://msdn.microsoft.com/en-us/goglobal/cc305152.aspx
>>>
>>> ISO 10646 equivalency table:
>>>
>>> http://cpansearch.perl.org/src/NARUSE/Encode-EUCJPMS-0.07/ucm/cp51932.ucm
>>>
>>>
>>> Additional information:
>>>
>>> This is a request for a new registration of this charset.

This sentence can be removed.

>>> CP51932 is a variant of EUC-JP (like Windows-31J and Shift_JIS).
>>> this charset is different from EUC-JP in:
>>> * CP51932 doesn't support JIS X 0212
>>> * CP51932 supports characters extended by Windows Codepage 932
>>> * Unicode mapping of some characters are different
>>>
>>> Typical user of CP51932 is web browsers. When web browsers load
>>> a page which are declared or auto-detected as "EUC-JP", they don't
>>> interpret it as true EUC-JP registerd in IANA Character Sets but as
>>> CP51932. When they post form data as "EUC-JP", the data is also
>>> encoded as CP51932.
>>>
>>> The name "CP51932" is in use following applications:
>>> * Citrus iconv (NetBSD and DragonFly uses this)
>>> * patched GNU libiconv in FreeBSD ports
>>> * Mojikan http://www.mirai-ii.co.jp/moji/mojikan/
>>> * nkf 2.0.5
>>> * PHP 5.2.1
>>> * Ruby 1.9.1
>>> * Encode-EUCJPMS-0.06
>>>
>>> Moreover applications which uses MLang.DLL or .NET Framework for
>>> converting "EUC-JP" implicitly uses this charset.
>>>
>>> So this charset is widely used, but doesn't have its own name.

Please remove (or reword) this sentence.

>>> Intended use of this name is to override the implementation of EUC-JP
>>> or charset convertion.
>>> http://wiki.whatwg.org/wiki/Web_Encodings
>>> http://www.w3.org/Bugs/Public/show_bug.cgi?id=7444
>>>
>>> Why the name is not "Windows-51932" is some of applications which accept
>>> the name "CP51932" don't support the name "Windows-51932".
>>>
>>> CP51932 is for use of importing legacy data.
>>> UTF-8 is preferred to CP51932 for new system.
>>>
>>> Related references are:
>>>
>>> "Remarks" of "GetEncodings Method" of "System.Text"
>>> http://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings.aspx
>>>
>>>
>>>
>>> "UnicodeによるJIS X0213実装入門―情報システムの新たな日本語処理環境"
>>> 日経BPソフトプレス, ISBN 978-4891006082, 2008, p. 17-18, 20, 120-158

II'm not sure IANA can get this up in Japanese, but let's try. I propose 
to provide a translation of this reference, at least as a fallback.

"Introduction to JIS X0213 Implementation based on Unicode - A new 
Japanese Language Processing Environment for Information Systems", 
Nikkei BP Soft Press, ISBN 978-4891006082, 2008, pp. 17-18, 20, 120-158 
(in Japanese)

[translation is my own, please feel free to improve]


>>> CP51932 - Legacy Encoding Project
>>> http://legacy-encoding.sourceforge.jp/wiki/index.php?cp51932
>>>
>>> This charset is also known as Windows Codepage 51932.
>>>
>>> Person & email address to contact for further information:
>>>
>>> NARUSE, Yui
>>> Email: naruse@airemix.jp
>>>
>>> Intended usage: LIMITED USE


-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp