[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Registration of new charset GB18030
- To: ietf-charsets@iana.org
- Subject: Registration of new charset GB18030
- From: Anthony Fok <anthony@thizlinux.com>
- Date: Thu, 14 Mar 2002 19:11:42 +0800
- Cc: =?gb2312?B?s8LXsw==?= <chenzh@cesi.ac.cn>, Cheng XU <xucheng@cn.ibm.com>,haible@ilog.fr, suzhe@gnuchina.org, shwang@sonata.iscas.ac.cn,=?gb2312?B?zuK9oQ==?= <jwu@sonata.iscas.ac.cn>, leon@xteamlinux.com.cn,ygh@dlut.edu.cn, roger.so@sw-linux.com, pablo@mandrakesoft.com, zw@debian.org,yumingjian@china.com, chenxy@sun.ihep.ac.cn, Dirk Meyer <dmeyer@adobe.com>,markus.scherer@jtcsv.com, Ken Lunde <lunde@adobe.com>,li18nux2000@li18nux.org, bsd-locale@haun.org, wuzg@cesi.ac.cn,Yoshihiko Enomoto <YENOMOTO@jp.ibm.com>, Jack Kang <Jack.Kang@sun.com>
- Sender: Anthony Fok <anthony@thizlinux.com>
- User-Agent: Mutt/1.3.27i
Application of IANA Charset Registration for GB18030
----------------------------------------------------
Charset name:
GB18030
Charset aliases:
Currently none.
Suitability for use in MIME text:
Yes
Published specification(s):
The official GB 18030-2000 standard was published (in print) by the
China Standard Press (中国标准出版社 Zhongguo Biaozhun Chubanshe),
Beijing, March 17, 2000:
Chinese National Standard GB 18030-2000: Information Technology --
Chinese ideograms coded character set for information interchange --
Extension for the basic set
(信息枝术 -- 信息交换用汉字编码字符集 -- 基本集的扩充 Xinxi Jishu --
Xinxi Jiaohuan Yong Hanzi Bianma Zifuji -- Jibenji de Kuochong)
The mapping tables therein has been updated in late 2000 to correct
the mapping of the "Euro" character and to exclude the surrogate
area.
Dirk Meyer <dmeyer@adobe.com> (Adobe Systems) has kindly provided
an English summary, explanations, and remarks of the GB 18030-2000
standard on-line (February 16, 2001):
ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf
Markus Scherer <markus.scherer@us.ibm.com> (IBM) also published
"GB 18030: A mega-codepage: Exploring the history and structure of
the new Chinese Unicode standard" on-line (February 2001):
http://oss.software.ibm.com/icu/docs/papers/gb18030.html
ISO 10646 equivalency table:
Markus Scherer (IBM) et al. have prepared an authorative GB18030
and ISO 10646 mapping table with the latest revisions
in CharMapML (XML) format (ref. Unicode Technical Report #22).
It is available on-line at:
http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-2000.xml
Additional information:
To facilitate electronic communication in People's Republic of
China, and to provide a smooth migration path from the older
GB 2312-1980 standard and GBK (1995) specification to ISO 10646 /
Unicode / GB 13000.1, the Chinese government published the GB
18030-2000 standard, which is code- and character- compatible with
the full codespace of ISO 10646 / Unicode standards from U+0000 to
U+10FFFF.
GB18030 support is mandatory for all operating systems sold in
Mainland China on or after September 1, 2001. (Embedded systems
and PDAs are currently exempt.) Eventually, end-user applications
must also fully support the GB18030 standard--mere UTF-8 support is
not enough. Although this mandatory statute may seem too strict,
it is a smart move to solve a pressing Chinese text communication
issue once and for all, while providing backward compatibility to
legacy GB2312/GBK systems. Therefore, it is important for all
developers to learn and implement this standard esp. if they intend
to sell their software in Mainland China.
The current GB18030 standard specifies the addition of CJK
Extension A, and ethnic minority languages Mongolian, Tibetan,
Uyghur (Arabic) and Yi. Since GB18030 is fully ISO 10646
compatible, support for CJK Extension B and all other languages
will be easy. More importantly, the GB18030 standard means that
special Chinese characters in people's names, geographic names and
ancient documents may finally be processed.
In a nutshell, it is the Chinese version of UTF-8: whereas UTF-8
maintains compatibility with ASCII, GB18030 maintains compatibility
with GB2312/GBK and provides full ISO 10646 compatibility. Part of
the mapping is from a lookup table (similar to GBK). The rest is
all calculated algorithmically.
A brief summary of the GB18030 codepoints is listed below:
1-byte: {00-7F}
Same as US-ASCII / ISO 646 IRV (1991)
2-byte: {81-FE}{40-7E,80-FE}
Same as GBK (But now only 1-to-1 mappings remain)
4-byte: {81-FE}{30-39}{81-FE}{30-39}
Maps linearly to ISO 10646 starting from
GB+81308130 = U+0080 while skipping the mappings
already defined in the 1-byte and 2-byte areas. The
surrogate area is excluded.
More information on the GB18030 standard and sample implementations
may be found on the Internet.
Person & email address to contact for further information:
CHEN Zhuang (陈壮)
chenzh@cesi.ac.cn
Chinese IT Standardization Technical Committee
Chinese Electronics Standardization Institute
Additionally, please Cc: ietf-charsets@iana.org to keep the
community informed, as the implementation of the GB18030 standard
on operating systems and applications is a community effort.
Intended usage:
COMMON
Acknowledgement:
Appreciations and kudos to the Internet community for documenting
and explaining the GB 18030-2000 standard to the whole world;
for implementing this new standard in software so quickly;
and for their comments to this registration.
Special thanks to Dirk Meyer <dmeyer@adobe.com> for his translation
of the GB18030 standard, and to Markus Scherer <markus.scherer@us.ibm.com>
for his GB18030/Unicode mapping table.
-- Anthony Fok <anthony@thizlinux.com>, March 14, 2002
--
Anthony Fok Tung-Ling
ThizLinux Laboratory <anthony@thizlinux.com> http://www.thizlinux.com/
Debian Chinese Project <foka@debian.org> http://www.debian.org/intl/zh/
Come visit Our Lady of Victory Camp! http://www.olvc.ab.ca/