[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: draft-hoffman-utf16-01.txt available
Francois Yergeau wrote:
>
> And further, I happen to think that all XML entities (in UTF-16) having a
> BOM is a Good Thing. The XML spec is designed such that one can always
> determine the character encoding without external info, let's keep it that
> way.
Actually, the charset parameter of text/xml or appliation/xml, if exists,
is authoritative. In the case of text/xml, the default is US-ASCII (Jim and
I were instructed to choose US-ASCII by the IESG, which is aware of the
inconsistency with HTTP 1.1). More about this, see RFC2376.
medavis2@us.ibm.com wrote:
> *** Even if XML did not require a BOM, it would not be unambiguous! Look at
> Appendix F in
> http://www.xml.com/axml/target.html#sec-guessing. The file would just have
> to have the initial '<?xml' like all other encodings. To quote:
>
> "Because each XML entity not in UTF-8 or UTF-16 format must begin with an
> XML encoding declaration, in which the first characters must be '<?xml',
> any conforming processor can detect, after two to four octets of input,
> which of the following cases apply. In reading this list, it may help to
> know that in UCS-4, '<' is "#x0000003C" and '?' is "#x0000003F", and the
> Byte Order Mark required of UTF-16 data streams is "#xFEFF".
UTF-16 XML entities do *not* have to begin with '<?xml'. Thus, if the BOM
is made optional, we have a problem when the charset parameter is not
available.
Cheers,
Makoto
Fuji Xerox Information Systems
Tel: +81-44-812-7230 Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp