[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registering GBK and GB18030 in the IANA charset registry



Here is a first draft for the GB18030 registration.  Please edit it
to make it look good.  :-)  Also, we need a contact person.  Could
someone contact the Chinese national standard committee (Ministry of
Information?)  Thanks!

Anthony

-- 
Anthony Fok Tung-Ling
ThizLinux Laboratory   <anthony@thizlinux.com> http://www.thizlinux.com/
Debian Chinese Project <foka@debian.org>       http://www.debian.org/intl/zh/
Come visit Our Lady of Victory Camp!           http://www.olvc.ab.ca/

            Application of IANA Charset Registration for GB18030
	    ----------------------------------------------------

					    (First draft, November 9, 2001)

Purpose
=======

1. This is a proposal to register "GB18030" as a Charset name with
   Internet Assigned Numbers Authority (IANA).


Background
==========

2. To facilitate electronic communication in People's Republic of China,
   and to provide a smooth migration path from the older GB 2312-1980
   standard and GBK (1993?) specification to Unicode / ISO 10646 /
   GB 13000.1, the Chinese government published the GB 18030-2000 standard
   on March 17, 2000:

     Chinese National Standard GB 18030-2000: Information Technology --
        Chinese Ideograms Coded Character Set for Information Interchange --
        Extension for the Basic Set

     (Xinxi Jishu -- Xinxi Jiaohuan Yong Hanzi Bianma Zifuji -- Jibenji de
      Kuochong)


3. A brief summary of the GB18030 codepoints is listed below:

	1-byte:  {00-7E}		Same as ASCII (ISO 646)
	2-byte:  {81-FE}{40-7E,80-FE} 	Same as GBK
	4-byte:  {81-FE}{30-39}{81-FE}{30-39} Maps linearly to ISO 10646
			starting from GB+81308130 = U+0080
			while skipping the mappings already defined
			in the 2-byte portions.

4. This registration request is specifically made since proper
   registration will facilitate development of Operating Systems and
   Internet-related software supporting GB18030.


Registration Requirements (as defined in RFC 2278)
==================================================

5. Required characteristics

  - The proposed charset conforms to the definition of a "charset"
    as defined in 2.4 of RFC 2278.

  - The proposed charset is suitable for use in MIME.

  - The proposed charset is specified in a stable, openly available
    specification.


6. New charsets

   - The proposed charset is NOT a new charset.  It is a Chinese national
     standard published on March 17, 2000 and further revised in
     November 2000.


7. Naming requirements

   - Proposed name for the charset is "GB18030".

   - The name conforms to the ABNF definition in sect 3.3 of RFC 2278.


8. Functionality requirement

   - The proposed charset functions as an actual charset.


9. Usage and implementation requirements

   The GB 18030-2000 standard wasIt has been created by the Chinese
   government with inputs from major vendors and institutions to
   ensure full ISO 10646 / Unicode 3.x compatibility while
   providing a smooth migration path from GB2312 and GBK encodings.

   The proposed "GB18030" charset has been a mandatory standard in
   Mainland China since August 31, 2001.  All operating systems
   sold on or after that date must support GB18030.  Several major
   operating systems already come with full GB18030 support, and
   other vendors will finish the transition in coming months.
   User-end applications will also fully support GB18030 in coming years.
   Full GB18030 fonts have been available for some time now.

   GB18030 is also China's solution to support minority ethnic languages
   such as Mongolian, Tibetan and Uyghur, as well as all other languages
   of the world.  It also fulfills the needs of ancient Chinese literature
   research, libraries, geography, names, and more.


10. Publication requirements

    The proposed charset has been published by the Chinese Government in
    March 2000.  The standard was revised in November 2000.  Newer revisions
    will be published to follow updates in Unicode and ISO 10646 standards.

    Dirk Meyer <dmeyer@adobe.com> has kindly translated and provided
    comments of the GB 18030-2000 standard in English.  While unofficial,
    it is indeed the authorative document on GB18030.  An on-line copy
    is available at:

	ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf

    Markus Scherer (IBM) has also written some technical documentation:

	http://oss.software.ibm.com/cvs/icu/~checkout~/charset/source/gb18030/gb18030.html

    The full XML definition of the GB18030 charset is defined here:

	http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-2000.xml


11. MIBenum requirements

	A MIBenum value for the proposed charset will be assigned by IANA
	at the time of registration.


Contact Person
==============

12. Any queries concerning this application may be addressed to the following:
 
	__________________________________________
	__________________________________________
	__________________________________________
	__________________________________________
	__________________________________________
	(Attn: _____________)

	Email: ______________________