[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
The UTC does not consider it a botch. It is permissible for UTF-16 to be
either BE or LE if it does not have a BOM. For example, on Windows a
database field might contain BOM-less LE text. In such a case, a
higher-level protocol is establishing the byte orientation.
I personally agree that it would be far better to use the unambiguous term
UTF-16LE for BOM-less UTF-16 text for such serializations. I think the
problem is that the term "UTF-16" can also refer to the in-memory use of
UTF-16 as a sequence of 16-bit chunks, where byte-orientation is not an
issue (unless you are skanky and use unions to look at the bytes). So people
like to simply refer to it as UTF-16 whether in memory or serialized,
instead of the more appropriate term UTF-16LE.
However, if the IETF liaison wants to present a proposal to restrict UTF-16
and UTF-32 -- when used as a serialization into bytes, to being only BE if
there is no BOM, I believe that the UTC would certainly take that into
consideration. The next meeting is happening very soon...
Mark
----- Original Message -----
From: <ned.freed@mrochek.com>
To: "Mark Davis" <markdavis34@home.com>
Cc: "Harald Tveit Alvestrand" <harald@alvestrand.no>; "Mark Davis"
<mark.davis@us.ibm.com>; <ned.freed@mrochek.com>; <ietf-charsets@iana.org>
Sent: Friday, May 11, 2001 09:13
Subject: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
> > Thanks for your feedback. I will resubmit them.
>
> > Comments:
>
> > A. If each charset needs to be in a separate message, then you really
ought
> > to fix http://www.normos.org/ietf/bcp/bcp19.txt. It says:
>
> > "5. Charset Registration Template
>
> > To: ietf-charsets@iana.org
> > Subject: Registration of new charset [names]"
>
> > with the word "names" in plural. This is misleading.
>
> The rest of the regisration form clearly talks about a single charset with
> multiple names, so I'm not sure I buy your reasoning here. However, since
we
> want to discourage the use of any aliases, I have no problem with changing
it
> to "name" singular.
>
> > B. UTF-32 in Unicode, as with UTF-16, could be BOM-less, with the
> > orientation being determined by a higher-level protocol. The IETF
> > registration (with good reason!) can impose a further restriction, as it
> > does with UTF-16, that BOM-less UTF-16 must be BE. I will put such a
clause
> > in the registration.
>
> Which means the UTC has apparently learned nothing from the UTF-16
disaster.
> If we push back on this is there any hope of getting this botch fixed?
>
> Ned
>