[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE

To: [email protected]
Subject: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
From: Mark Davis <[email protected]>
Date: Fri, 11 May 2001 10:06:10 -0700
Cc: Harald Tveit Alvestrand <[email protected]>,Mark Davis <[email protected]>, [email protected],[email protected]
References: <[email protected]><[email protected]>
Reply-to: Mark Davis <[email protected]>

The UTC does not consider it a botch. It is permissible for UTF-16 to be
either BE or LE if it does not have a BOM. For example, on Windows a
database field might contain BOM-less LE text. In such a case, a
higher-level protocol is establishing the byte orientation.

I personally agree that it would be far better to use the unambiguous term
UTF-16LE for BOM-less UTF-16 text for such serializations. I think the
problem is that the term "UTF-16" can also refer to the in-memory use of
UTF-16 as a sequence of 16-bit chunks, where byte-orientation is not an
issue (unless you are skanky and use unions to look at the bytes). So people
like to simply refer to it as UTF-16 whether in memory or serialized,
instead of the more appropriate term UTF-16LE.

However, if the IETF liaison wants to present a proposal to restrict UTF-16
and UTF-32 -- when used as a serialization into bytes, to being only BE if
there is no BOM, I believe that the UTC would certainly take that into
consideration. The next meeting is happening very soon...

Mark
----- Original Message -----
From: <[email protected]>
To: "Mark Davis" <[email protected]>
Cc: "Harald Tveit Alvestrand" <[email protected]>; "Mark Davis"
<[email protected]>; <[email protected]>; <[email protected]>
Sent: Friday, May 11, 2001 09:13
Subject: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE


> > Thanks for your feedback. I will resubmit them.
>
> > Comments:
>
> > A. If each charset needs to be in a separate message, then you really
ought
> > to fix http://www.normos.org/ietf/bcp/bcp19.txt. It says:
>
> > "5.  Charset Registration Template
>
> >      To: [email protected]
> >      Subject: Registration of new charset [names]"
>
> > with the word "names" in plural. This is misleading.
>
> The rest of the regisration form clearly talks about a single charset with
> multiple names, so I'm not sure I buy your reasoning here. However, since
we
> want to discourage the use of any aliases, I have no problem with changing
it
> to "name" singular.
>
> > B. UTF-32 in Unicode, as with UTF-16, could be BOM-less, with the
> > orientation being determined by a higher-level protocol. The IETF
> > registration (with good reason!) can impose a further restriction, as it
> > does with UTF-16, that BOM-less UTF-16 must be BE. I will put such a
clause
> > in the registration.
>
> Which means the UTC has apparently learned nothing from the UTF-16
disaster.
> If we push back on this is there any hope of getting this botch fixed?
>
> Ned
>

Follow-Ups:
- Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
  - From: Patrik F�ltstr�m <[email protected]>

References:
- Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
  - From: Harald Tveit Alvestrand <[email protected]>
- Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
  - From: [email protected]

Prev by Date: Registration of new charset: UTF-32LE
Next by Date: Re: Registration of new charset: UTF-32
Prev by thread: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
Next by thread: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
Index(es):
- Date
- Thread