[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Registration of new charset GB18030



Dear all,

I made some correction and revision after some helpful comments by Bruno
Haible and after reading some more goodies by Dirk Meyer (his presentation
at the 18th IUC).  So, here it is again.  All comments and suggestions are
welcome.  :-)

Cheers,

Anthony


            Application of IANA Charset Registration for GB18030
            ----------------------------------------------------

Charset name:

    GB18030

Charset aliases:

    Currently none.

Suitability for use in MIME text:

    Yes

Published specification(s):

    The official GB 18030-2000 standard was created by the
    Chinese IT Standardization Technical Committee (中华人民共和国
    全国信息技术标准化技术委员会), and was published in print by the
    China Standard Press (中国标准出版社), Beijing, March 17, 2000:

      Chinese National Standard GB 18030-2000: Information Technology --
      Chinese ideograms coded character set for information interchange --
      Extension for the basic set
      (信息技术 -- 信息交换用汉字编码字符集 -- 基本集的扩充  Xinxi Jishu --
      Xinxi Jiaohuan Yong Hanzi Bianma Zifuji -- Jibenji de Kuochong)

    The mapping data was re-released on November 30, 2000, mainly to
    correct the mapping of the "Euro" character and to exclude the
    surrogate area.

    Dirk Meyer <dmeyer@adobe.com> (Adobe Systems) has kindly provided
    an English summary, explanations, and remarks of the GB 18030-2000
    standard on-line (February 16, 2001):

      http://examples.oreilly.com/cjkvinfo/pdf/GB18030_Summary.pdf

    Markus Scherer <markus.scherer@us.ibm.com> (IBM) also published
    "GB 18030: A mega-codepage: Exploring the history and structure of
    the new Chinese Unicode standard" on-line (February 2001):

      http://oss.software.ibm.com/icu/docs/papers/gb18030.html

    Meyer's presentation at the 18th International Unicode Conference,
    "Two New Chinese Character Standards: HK SCS / GB 18030-2000"     
    (Hong Kong, April 2001), provides insightful historical background:

      http://examples.oreilly.com/cjkvinfo/pdf/IUC18B17.pdf


ISO 10646 equivalency table:

    Markus Scherer (IBM) et al. have prepared an authorative GB18030
    and ISO 10646 mapping table with the latest revisions
    in CharMapML (XML) format (ref. Unicode Technical Report #22).
    It is available on-line at:

      http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-2000.xml


Additional information:

    The People's Republic of China has already expressed her
    fundamental consent to support the combined efforts of the ISO/IEC
    and the Unicode Consortium through publishing a Chinese National
    Standard that was code- and character- compatible with ISO 10646 /
    Unicode.  This standard was named GB 13000.

    Since the legacy GB 2312-1980 standard and GBK (1995) specification
    is still widely used, it is important to provide a smooth migration
    path towards GB 13000.  GBK was the first step in this direction. 
    The new GB 18030-2000 standard "replaces" GBK: it retains legacy
    encoding compatibility, and also provides for a complete and final
    mechanism to include future extensions of Unicode.

    In a nutshell, it is the Chinese version of UTF-8: whereas UTF-8
    maintains compatibility with ASCII, GB18030 maintains compatibility
    with GB2312/GBK and provides full ISO 10646 compatibility.  Part of
    the mapping data is from a lookup table (similar to GBK).  The rest is
    calculated algorithmically.

    A brief summary of the GB18030 codepoints is listed below:

        1-byte:  {00-7F}
                  Same as GB 11383-89 / US-ASCII / ISO 646 IRV (1991)

        2-byte:  {81-FE}{40-7E,80-FE}
                  A full superset of GBK, but with fallback mappings
                  removed so that only 1-to-1 roundtrip mappings remain

        4-byte:  {81-FE}{30-39}{81-FE}{30-39}
                  Maps linearly to ISO 10646 starting from
                  GB+81308130 = U+0080 up to U+FFFF, and from
                  GB+90308130 = U+10000 up to U+10FFFF, skipping the
                  mappings already defined in the 1-byte and 2-byte areas.
                  The surrogate area is excluded.

    The current GB18030 standard specifies the addition of CJK
    Extension A, and ethnic minority languages Mongolian, Tibetan,
    Uyghur (Arabic) and Yi.  Since GB18030 is fully ISO 10646
    compatible, it readily supports CJK Extension B and other
    languages.

    GB18030 is a "mandatory" standard: starting September 1, 2001, all
    operating systems sold in Mainland China must support this
    standard.  (Embedded systems and PDAs are currently exempt.)
    Eventually, end-user applications must also fully support the
    GB18030 standard--mere UTF-8 support is not enough.  Harsh it may
    seem, this regulation is a smart move: it ensures that rare Chinese
    characters found in personal names, geographic names and ancient
    literature, as well as minority languages, may finally be
    computerized and exchanged all across the country.


Person & email address to contact for further information:

    CHEN Zhuang  (陈壮)
      chenzh@cesi.ac.cn
      Chinese IT Standardization Technical Committee
      Chinese Electronics Standardization Institute

    Additionally, please Cc: ietf-charsets@iana.org to keep the
    community informed, as the implementation of the GB18030 standard
    on operating systems and applications is a community effort.


Intended usage:

    COMMON


Acknowledgement:

    Appreciations and kudos to the Internet community for documenting
    and explaining the GB 18030-2000 standard to the whole world;
    for implementing this new standard in software so quickly;
    and for their comments to this registration.

    Special thanks to Dirk Meyer <dmeyer@adobe.com> for his translation
    of the GB18030 standard, and to Markus Scherer <markus.scherer@us.ibm.com>
    for his GB18030/Unicode mapping table.  This registration also
    contains excerpts of their writings.


Compiled by Anthony Fok <anthony@thizlinux.com> (霍东灵), March 15, 2002.