[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Charset reviewer appointed



With regards to Harald Alvestrand's summary of the open
issues with respect to the UTF-16 registration, the only
way I see forward, given the nature of the "charset"
definition, is to split this request into two registrations:

UTF-16   big-endian UTF-16
UTF-16BS little-endian (byte-swapped) UTF-16

This would finesse the whole, irritating business of the
position and requirement for the BOM in string-handling
protocols. The emitter of data in one or the other of
the two "charsets" would have to guarantee the byte order
of the data it purports to emit. And the BOM would revert
again to what it is supposed to be: a handy signature which
*may* be included in text for those instances in which an
interpreter *may* be getting data of either polarity in a
mixed platform environment.

I don't like the garden path people have been starting down
of requiring that a BOM *must* be attached to every piece of
little-endian UTF-16 text, no matter what. That is, in my
opinion, trying to turn the BOM into something the functional
equivalent of a escape sequence for identifying a character
set in the context of ISO 2022 -- it just becomes a metacharacter
for identifying the "charset". Why not just bite the bullet
and identify the "charset" unambiguously from the start?

--Ken Whistler

----- Begin Included Message -----

> > Date: Sun, 21 Jun 1998 07:07:12 +0200
> > From: Harald Tveit Alvestrand <Harald.Alvestrand@maxware.no>
> > To: ietf-charsets@iana.org
> 
> > WRT outstanding registrations, my opinion at the moment is:
> > 
...
> > 
> > - UTF-16 is controversial because of the BOM and byte-order issues.
> >   I think consensus has not been achieved; the significant objections
> >   are:
> > 
> >   - While there is consensus that big-endian is preferred, there is
> >     not consensus if little-endian is acceptable.
> >   - While there is consensus that little-endian, if allowed, MUST
> >     include the BOM, there is no consensus on where, if ever, a BOM
> >     must be inserted in big-endian encoded text.
> >   - There is no consensus that it is possible to write sensible rules
> >     about using the BOM in protocols that carry multiple independent
> >     pieces of text.
> > 
> >   This registration will wait a bit yet.
> 

----- End Included Message -----