[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Registration of new charset GB18030




            Application of IANA Charset Registration for GB18030
            ----------------------------------------------------

Charset name:

    GB18030

Charset aliases:

    Currently none.

Suitability for use in MIME text:

    Yes

Published specification(s):

    The official GB 18030-2000 standard was published (in print) by the
    China Standard Press (中国标准出版社 Zhongguo Biaozhun Chubanshe),
    Beijing, March 17, 2000:

      Chinese National Standard GB 18030-2000: Information Technology --
      Chinese ideograms coded character set for information interchange --
      Extension for the basic set
      (信息枝术 -- 信息交换用汉字编码字符集 -- 基本集的扩充 Xinxi Jishu --
      Xinxi Jiaohuan Yong Hanzi Bianma Zifuji -- Jibenji de Kuochong)

    The mapping tables therein has been updated in late 2000 to correct
    the mapping of the "Euro" character and to exclude the surrogate
    area.

    Dirk Meyer <dmeyer@adobe.com> (Adobe Systems) has kindly provided
    an English summary, explanations, and remarks of the GB 18030-2000
    standard on-line (February 16, 2001):

      ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf

    Markus Scherer <markus.scherer@us.ibm.com> (IBM) also published
    "GB 18030: A mega-codepage: Exploring the history and structure of
    the new Chinese Unicode standard" on-line (February 2001):

      http://oss.software.ibm.com/icu/docs/papers/gb18030.html


ISO 10646 equivalency table:

    Markus Scherer (IBM) et al. have prepared an authorative GB18030
    and ISO 10646 mapping table with the latest revisions
    in CharMapML (XML) format (ref. Unicode Technical Report #22).
    It is available on-line at:

      http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-2000.xml


Additional information:

    To facilitate electronic communication in People's Republic of
    China, and to provide a smooth migration path from the older
    GB 2312-1980 standard and GBK (1995) specification to ISO 10646 /
    Unicode / GB 13000.1, the Chinese government published the GB
    18030-2000 standard, which is code- and character- compatible with
    the full codespace of ISO 10646 / Unicode standards from U+0000 to
    U+10FFFF.

    GB18030 support is mandatory for all operating systems sold in
    Mainland China on or after September 1, 2001.  (Embedded systems
    and PDAs are currently exempt.)  Eventually, end-user applications
    must also fully support the GB18030 standard--mere UTF-8 support is
    not enough.  Although this mandatory statute may seem too strict,
    it is a smart move to solve a pressing Chinese text communication
    issue once and for all, while providing backward compatibility to
    legacy GB2312/GBK systems.  Therefore, it is important for all
    developers to learn and implement this standard esp. if they intend
    to sell their software in Mainland China.

    The current GB18030 standard specifies the addition of CJK
    Extension A, and ethnic minority languages Mongolian, Tibetan,
    Uyghur (Arabic) and Yi.  Since GB18030 is fully ISO 10646
    compatible, support for CJK Extension B and all other languages
    will be easy.  More importantly, the GB18030 standard means that
    special Chinese characters in people's names, geographic names and
    ancient documents may finally be processed.

    In a nutshell, it is the Chinese version of UTF-8: whereas UTF-8
    maintains compatibility with ASCII, GB18030 maintains compatibility
    with GB2312/GBK and provides full ISO 10646 compatibility.  Part of
    the mapping is from a lookup table (similar to GBK).  The rest is
    all calculated algorithmically.

    A brief summary of the GB18030 codepoints is listed below:

        1-byte:  {00-7F}
                  Same as US-ASCII / ISO 646 IRV (1991)

        2-byte:  {81-FE}{40-7E,80-FE}
                  Same as GBK (But now only 1-to-1 mappings remain)

        4-byte:  {81-FE}{30-39}{81-FE}{30-39}
                  Maps linearly to ISO 10646 starting from
                  GB+81308130 = U+0080 while skipping the mappings
                  already defined in the 1-byte and 2-byte areas.  The
                  surrogate area is excluded.

    More information on the GB18030 standard and sample implementations
    may be found on the Internet.


Person & email address to contact for further information:

    CHEN Zhuang  (陈壮)
      chenzh@cesi.ac.cn
      Chinese IT Standardization Technical Committee
      Chinese Electronics Standardization Institute

    Additionally, please Cc: ietf-charsets@iana.org to keep the
    community informed, as the implementation of the GB18030 standard
    on operating systems and applications is a community effort.


Intended usage:

    COMMON


Acknowledgement:

    Appreciations and kudos to the Internet community for documenting
    and explaining the GB 18030-2000 standard to the whole world;
    for implementing this new standard in software so quickly;
    and for their comments to this registration.

    Special thanks to Dirk Meyer <dmeyer@adobe.com> for his translation
    of the GB18030 standard, and to Markus Scherer <markus.scherer@us.ibm.com>
    for his GB18030/Unicode mapping table.

                  -- Anthony Fok <anthony@thizlinux.com>, March 14, 2002


-- 
Anthony Fok Tung-Ling
ThizLinux Laboratory   <anthony@thizlinux.com> http://www.thizlinux.com/
Debian Chinese Project <foka@debian.org>       http://www.debian.org/intl/zh/
Come visit Our Lady of Victory Camp!           http://www.olvc.ab.ca/