[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registration of new charset "UTF-16"



Martin J. Duerst wrote:
> Misha - While we do not need these references anymore for defining which
> character is at which codepoint (and how this gets updated), we still
> need references to ISO 10646 and Unicode for the definition of UTF-16 per se.
> So we cannot just throw out the above references. But we can point to the
> exact place of the UTF-16 definition.

Yes, you are right.  I was wrong.  We have to separate UTF-16 as a Coded 
Character Set and UTF-16 as a Character Encoding Scheme (For the definition 
of these words, see RFC2278 or RFC2130).  For the latter, we reference to 
Appendix C.3 of [UNICODE] and Amendment 1 of [ISO-10646].  For the former, 
we simply say "the same as in UTF-8".
 

-----------------------------------------------------------------------
We propose to register UTF-16 as a charset in IANA.  

UTF-16 should be sent in network byte order (big-endian).  However, 
recipients should be able to handle both big-endian and little-endian.

This charset is not permitted for use with MIME text/* media types.
However, the MIME-like mechanism of HTTP may use this character set for text/*, 
since this mechanism is exempt from the restrictions on the text top-level type
(see section 19.4.1 of HTTP 1.1 [RFC-2068]). 

   [RFC-2068] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berners-
   Lee. "Hypertext Transfer Protocol -- HTTP/1.1" UC Irvine, DEC,
   MIT/LCS. RFC 2068. January, 1997.


Charset name(s): UTF-16

Published specification(s): 

UTF-16 as a Character Encoding Scheme is defined in Appendix C.3 
of [UNICODE] and Amendment 1 of [ISO-10646].

The Coded Character Set that UTF-16 refers to is the same version of 
ISO/IEC 10646-1 and Unicode that the charset "UTF-8" refers to.

  [ISO-10646] ISO/IEC, Information Technology - Universal Multiple-Octet Coded 
  Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane, 
  May 1993.

  [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version 2.0", 
  Addison-Wesley, 1996.


Person & email address to contact for further information:

Tatsuo L. Kobayashi
Digital Culture Research Center, JUSTSYSTEM Corp.
Email: Tatsuo_Kobayashi@justsystem.co.jp

Murata Makoto (Family Given)
Fuji Xerox Information Systems,
KSP 9A7, 2-1 Sakado 3-chome,
Takatsu-ku, Kawasaki-shi,
213 Japan
Email: murata@fxis.fujixerox.co.jp 

Makoto
 
Fuji Xerox Information Systems
 
Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp