[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registration of new charset: UTF-32



Small correction:

Mark Davis wrote:
> initial BOM then the byte-orientation must be big-endian. That is, in any
> stream that does not begin with the (hex) byte sequence <00 00 FE FF> all of
> the bytes are interpreted as big-endian.

This must read:
... in any stream that does not begin with the (hex) byte sequence <FF FE 00 00> all of the bytes are interpreted as big-endian.

Explanation: Mark had the BE BOM in his sentence. It must be "everything that does not begin with the LE BOM is big-endian".

One might add a note that UTF-32 with a little-endian BOM could appear to have a UTF-16 LE BOM because that is a subset. The UTF-32 registration might specify that <FF FE 00 00> is UTF-32 (little-endian), and that <FF FE xx xx> with not both xx bytes 00 is UTF-16 (little-endian).

markus