[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Revised proposal for UTF-16



At 13:56 24.05.98 -0700, Dan Kegel wrote:
>Perhaps a middle ground, here?  How about this (suitably reworded):
>   UTF-16 generators SHOULD [MUST?] NOT send in little-endian byte order, but
>   if they do, they MUST prefix the stream with a little-endian BOM.
>   UTF-16 consumers MUST assume the default byte-order is big-endian,
>   but MUST also accept little-endian if prefixed with a little-endian BOM.
>
>That way, big-endian is preferred, yet interoperability is preserved.

Hmmm.... everyone MUST do A, but if they don't, they MUST....

Suggested alternative:

 UTF-16 generators MUST send in big-endian byte order.

 NOTE: Some implementations that do not conform to this specification
 have occasionally sent data in little-endian byte order. When they do
 this, they commonly precede the data with a zero width non breaking
 space (also called Byte Order Mark or BOM) (0xFEFF).
 Thus, an UTF-16 parser encountering the code 0xFFFE as the first
 character of a purported UTF-16 stream may safely assume that he
 has encountered a nonconformant data source.

The info about what is right is there; the info about how to tell if
you encounter someone doing the Wrong Thing is there too.

                   Harald A


-- 
Harald Tveit Alvestrand, Maxware, Norway
Harald.Alvestrand@maxware.no