[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Revised proposal for UTF-16



At 11:35 PM 5/24/98 +0200, Harald Alvestrand wrote:
>Hmmm.... everyone MUST do A, but if they don't, they MUST....
>Suggested alternative:
>
> UTF-16 generators MUST send in big-endian byte order.
>
> NOTE: Some implementations that do not conform to this specification
> have occasionally sent data in little-endian byte order. When they do
> this, they commonly precede the data with a zero width non breaking
> space (also called Byte Order Mark or BOM) (0xFEFF).
> Thus, an UTF-16 parser encountering the code 0xFFFE as the first
> character of a purported UTF-16 stream may safely assume that he
> has encountered a nonconformant data source.
>
>The info about what is right is there; the info about how to tell if
>you encounter someone doing the Wrong Thing is there too.

True, but it's a little wishy-washy, in that it doesn't try to 
lay down the law about how the little-endian holdouts
must behave in order to get along peacefully with the rest of us.
You need to tell them they have to use a BOM if they're going to
talk funny.
- Dan