[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: draft-hoffman-utf16-01.txt available



Francois Yergeau wrote:
> 
> And further, I happen to think that all XML entities (in UTF-16) having a
> BOM is a Good Thing.  The XML spec is designed such that one can always
> determine the character encoding without external info, let's keep it that
> way.

Actually, the charset parameter of text/xml or appliation/xml, if exists, 
is authoritative.  In the case of text/xml, the default is US-ASCII (Jim and 
I were instructed to choose US-ASCII by the IESG, which is aware of the 
inconsistency with HTTP 1.1).  More about this, see RFC2376.

medavis2@us.ibm.com wrote:
> *** Even if XML did not require a BOM, it would not be unambiguous! Look at
> Appendix F in
> http://www.xml.com/axml/target.html#sec-guessing. The file would just have
> to have the initial '<?xml' like all other encodings. To quote:
> 
> "Because each XML entity not in UTF-8 or UTF-16 format must begin with an
> XML encoding declaration, in which the first characters must be '<?xml',
> any conforming processor can detect, after two to four octets of input,
> which of the following cases apply. In reading this list, it may help to
> know that in UCS-4, '<' is "#x0000003C" and '?' is "#x0000003F", and the
> Byte Order Mark required of UTF-16 data streams is "#xFEFF".

UTF-16 XML entities do *not* have to begin with '<?xml'.  Thus, if the BOM 
is made optional, we have a problem when the charset parameter is not 
available.

Cheers,

Makoto
 
Fuji Xerox Information Systems
 
Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp