[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Revised proposal for UTF-16
I think we are converging but minor differences exist. Little endian:
should not or must not? Is the BOM mandatory or recommended?
1. Harald Alvestrand
UTF-16 generators MUST send in big-endian byte order.
NOTE: Some implementations that do not conform to this specification
have occasionally sent data in little-endian byte order. When they do
this, they commonly precede the data with a zero width non breaking
space (also called Byte Order Mark or BOM) (0xFEFF).
Thus, an UTF-16 parser encountering the code 0xFFFE as the first
character of a purported UTF-16 stream may safely assume that he
has encountered a nonconformant data source. There is no way to 100%
reliably detect little-endian data that does not use the BOM.
2. Dan Kegel (in my interpretation)
UTF-16 generators must begin with the BOM. They SHOULD [MUST?] NOT send in
little-endian byte order, but if they do, they MUST prefix the stream
with a little-endian BOM. UTF-16 consumers MUST assume the default
byte-order is big-endian, but MUST also accept little-endian if prefixed
with a little-endian BOM.
3. My proposal
I would like to reduce useless options. Little endian is fine, but it
should be used only in local environments. UTF-16 without the BOM is fine,
but thee should be used only in local evrionments.
Here is my proposal.
UTF-16 generators MUST send in big-endian byte order and must begin with the
zero width non breaking space (also called Byte Order Mark or BOM) (0xFEFF).
NOTE: Some implementations that do not conform to this specification
have occasionally sent data in little-endian byte order. When they do
this, they commonly precede the data with the BOM.
Thus, an UTF-16 parser encountering the code 0xFFFE as the first
character of a purported UTF-16 stream may safely assume that he
has encountered a nonconformant data source. If the BOM is absent,
there is no way to 100% reliably detect little-endian data that does not
use the BOM.
Makoto
Fuji Xerox Information Systems
Tel: +81-44-812-7230 Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp