[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Registration of new charset GB18030
- To: ietf-charsets@iana.org
- Subject: Registration of new charset GB18030
- From: Anthony Fok <anthony@thizlinux.com>
- Date: Fri, 15 Mar 2002 03:58:10 +0800
- Cc: =?gb2312?B?s8LXsw==?= <chenzh@cesi.ac.cn>, Cheng XU <xucheng@cn.ibm.com>,haible@ilog.fr, suzhe@gnuchina.org, shwang@sonata.iscas.ac.cn,=?gb2312?B?zuK9oQ==?= <jwu@sonata.iscas.ac.cn>, leon@xteamlinux.com.cn,ygh@dlut.edu.cn, roger.so@sw-linux.com, pablo@mandrakesoft.com, zw@debian.org,Dirk Meyer <dmeyer@adobe.com>, markus.scherer@jtcsv.com,Ken Lunde <lunde@adobe.com>, li18nux2000@li18nux.org, bsd-locale@haun.org,wuzg@cesi.ac.cn, Yoshihiko Enomoto <YENOMOTO@jp.ibm.com>,Jack Kang <Jack.Kang@sun.com>
- Sender: Anthony Fok <anthony@thizlinux.com>
- User-Agent: Mutt/1.3.27i
Dear all,
I made some correction and revision after some helpful comments by Bruno
Haible and after reading some more goodies by Dirk Meyer (his presentation
at the 18th IUC). So, here it is again. All comments and suggestions are
welcome. :-)
Cheers,
Anthony
Application of IANA Charset Registration for GB18030
----------------------------------------------------
Charset name:
GB18030
Charset aliases:
Currently none.
Suitability for use in MIME text:
Yes
Published specification(s):
The official GB 18030-2000 standard was created by the
Chinese IT Standardization Technical Committee (中华人民共和国
全国信息技术标准化技术委员会), and was published in print by the
China Standard Press (中国标准出版社), Beijing, March 17, 2000:
Chinese National Standard GB 18030-2000: Information Technology --
Chinese ideograms coded character set for information interchange --
Extension for the basic set
(信息技术 -- 信息交换用汉字编码字符集 -- 基本集的扩充 Xinxi Jishu --
Xinxi Jiaohuan Yong Hanzi Bianma Zifuji -- Jibenji de Kuochong)
The mapping data was re-released on November 30, 2000, mainly to
correct the mapping of the "Euro" character and to exclude the
surrogate area.
Dirk Meyer <dmeyer@adobe.com> (Adobe Systems) has kindly provided
an English summary, explanations, and remarks of the GB 18030-2000
standard on-line (February 16, 2001):
http://examples.oreilly.com/cjkvinfo/pdf/GB18030_Summary.pdf
Markus Scherer <markus.scherer@us.ibm.com> (IBM) also published
"GB 18030: A mega-codepage: Exploring the history and structure of
the new Chinese Unicode standard" on-line (February 2001):
http://oss.software.ibm.com/icu/docs/papers/gb18030.html
Meyer's presentation at the 18th International Unicode Conference,
"Two New Chinese Character Standards: HK SCS / GB 18030-2000"
(Hong Kong, April 2001), provides insightful historical background:
http://examples.oreilly.com/cjkvinfo/pdf/IUC18B17.pdf
ISO 10646 equivalency table:
Markus Scherer (IBM) et al. have prepared an authorative GB18030
and ISO 10646 mapping table with the latest revisions
in CharMapML (XML) format (ref. Unicode Technical Report #22).
It is available on-line at:
http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-2000.xml
Additional information:
The People's Republic of China has already expressed her
fundamental consent to support the combined efforts of the ISO/IEC
and the Unicode Consortium through publishing a Chinese National
Standard that was code- and character- compatible with ISO 10646 /
Unicode. This standard was named GB 13000.
Since the legacy GB 2312-1980 standard and GBK (1995) specification
is still widely used, it is important to provide a smooth migration
path towards GB 13000. GBK was the first step in this direction.
The new GB 18030-2000 standard "replaces" GBK: it retains legacy
encoding compatibility, and also provides for a complete and final
mechanism to include future extensions of Unicode.
In a nutshell, it is the Chinese version of UTF-8: whereas UTF-8
maintains compatibility with ASCII, GB18030 maintains compatibility
with GB2312/GBK and provides full ISO 10646 compatibility. Part of
the mapping data is from a lookup table (similar to GBK). The rest is
calculated algorithmically.
A brief summary of the GB18030 codepoints is listed below:
1-byte: {00-7F}
Same as GB 11383-89 / US-ASCII / ISO 646 IRV (1991)
2-byte: {81-FE}{40-7E,80-FE}
A full superset of GBK, but with fallback mappings
removed so that only 1-to-1 roundtrip mappings remain
4-byte: {81-FE}{30-39}{81-FE}{30-39}
Maps linearly to ISO 10646 starting from
GB+81308130 = U+0080 up to U+FFFF, and from
GB+90308130 = U+10000 up to U+10FFFF, skipping the
mappings already defined in the 1-byte and 2-byte areas.
The surrogate area is excluded.
The current GB18030 standard specifies the addition of CJK
Extension A, and ethnic minority languages Mongolian, Tibetan,
Uyghur (Arabic) and Yi. Since GB18030 is fully ISO 10646
compatible, it readily supports CJK Extension B and other
languages.
GB18030 is a "mandatory" standard: starting September 1, 2001, all
operating systems sold in Mainland China must support this
standard. (Embedded systems and PDAs are currently exempt.)
Eventually, end-user applications must also fully support the
GB18030 standard--mere UTF-8 support is not enough. Harsh it may
seem, this regulation is a smart move: it ensures that rare Chinese
characters found in personal names, geographic names and ancient
literature, as well as minority languages, may finally be
computerized and exchanged all across the country.
Person & email address to contact for further information:
CHEN Zhuang (陈壮)
chenzh@cesi.ac.cn
Chinese IT Standardization Technical Committee
Chinese Electronics Standardization Institute
Additionally, please Cc: ietf-charsets@iana.org to keep the
community informed, as the implementation of the GB18030 standard
on operating systems and applications is a community effort.
Intended usage:
COMMON
Acknowledgement:
Appreciations and kudos to the Internet community for documenting
and explaining the GB 18030-2000 standard to the whole world;
for implementing this new standard in software so quickly;
and for their comments to this registration.
Special thanks to Dirk Meyer <dmeyer@adobe.com> for his translation
of the GB18030 standard, and to Markus Scherer <markus.scherer@us.ibm.com>
for his GB18030/Unicode mapping table. This registration also
contains excerpts of their writings.
Compiled by Anthony Fok <anthony@thizlinux.com> (霍东灵), March 15, 2002.