[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Fwd: UTF-16
Could you tell me if UTF-16 is accepted or not?
RFC2376 (text/xml and application/xml) already mentions UTF-16 and the
BOM. I am afraid that confusion and incompatibility problems will arise
unless we register UTF-16 in IANA very soon.
MURATA Makoto wrote:
>
> --------------------------------------------------------------------
> We propose to register UTF-16 as a charset in IANA.
>
> UTF-16 generators MUST send in big-endian byte order and MUST
> begin with the zero width non breaking space (also called Byte
> Order Mark or BOM) (0xFEFF).
>
> NOTE: Some implementations that do not conform to this
> specification have occasionally sent data in little-endian byte
> order. When they do this, they commonly precede the data with the
> BOM. Thus, a UTF-16 parser encountering the code 0xFFFE as the
> first character of a purported UTF-16 stream may safely assume
> that he has encountered a nonconformant data source. There is no
> way to 100% reliably detect little-endian data that does not use
> the BOM.
>
> This character set is not permitted for use with MIME text/* media
> types. However, the MIME-like mechanism of HTTP may use this
> character set for text/*, since this mechanism is exempt from the
> restrictions on the text top-level type (see section 19.4.1 of
> HTTP 1.1 [RFC-2068]).
>
> [RFC-2068] R. Fielding, J. Gettys, J. Mogul, H. Frystyk,
> T. Berners-Lee. "Hypertext Transfer Protocol -- HTTP/1.1"
> UC Irvine, DEC, MIT/LCS. RFC 2068. January, 1997.
>
> Charset name(s): UTF-16
>
> Published specification(s):
>
> UTF-16 as a Character Encoding Scheme is defined in Appendix C.3
> of [UNICODE] and Amendment 1 of [ISO-10646].
>
> The Coded Character Set that UTF-16 refers to is the same version
> of ISO/IEC 10646-1 and Unicode that the charset "UTF-8" refers to.
>
> [ISO-10646] ISO/IEC, Information Technology - Universal
> Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture
> and Basic Multilingual Plane, May 1993.
>
> [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version 2.0",
> Addison-Wesley, 1996.
>
> [RFC-2279] F. Yergeau, "UTF-8, a transformation format of ISO 10646",
> January 1998.
>
> Person & email address to contact for further information:
>
> Tatsuo L. Kobayashi
> Digital Culture Research Center, JUSTSYSTEM Corp.
> Email: Tatsuo_Kobayashi@justsystem.co.jp
>
> Murata Makoto (Family Given)
> Fuji Xerox Information Systems,
> KSP 9A7, 2-1 Sakado 3-chome,
> Takatsu-ku, Kawasaki-shi,
> 213 Japan
> Email: murata@fxis.fujixerox.co.jp
Makoto
Fuji Xerox Information Systems
Tel: +81-44-812-7230 Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp