[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-16 (was: Re: Charset reviewer appointed)



At 06:20 PM 7/29/98 +0900, Martin J. Duerst wrote:
>What XML is currently stating is that all UTF-16 documents must start
>with a BOM...

I suspect the XML people are a good indication of what the world
expects from UTF-16 with regard to byte ordering, and that
they would be happy if UTF-16 were defined like this:

"UTF-16 generators SHOULD send in big-endian byte order.
UTF-16 generators that send in big-endian byte order MAY begin 
with the zero width non breaking space (also called Byte Order Mark or BOM) (0xFEFF).
UTF-16 generators that send in little-endian byte order MUST begin 
with the BOM."

which can be summed up as
"UTF-16 defaults to big-endian; an initial BOM can be used
to switch to little-endian."

I also suspect they'd be willing to modify XML's definition to make the
BOM optional for big-endian streams.
- Dan