[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
Thanks for your feedback. I will resubmit them.
Comments:
A. If each charset needs to be in a separate message, then you really ought
to fix http://www.normos.org/ietf/bcp/bcp19.txt. It says:
"5. Charset Registration Template
To: ietf-charsets@iana.org
Subject: Registration of new charset [names]"
with the word "names" in plural. This is misleading.
B. UTF-32 in Unicode, as with UTF-16, could be BOM-less, with the
orientation being determined by a higher-level protocol. The IETF
registration (with good reason!) can impose a further restriction, as it
does with UTF-16, that BOM-less UTF-16 must be BE. I will put such a clause
in the registration.
Mark
----- Original Message -----
From: "Harald Tveit Alvestrand" <harald@alvestrand.no>
To: "Mark Davis" <mark.davis@us.ibm.com>
Cc: <ned.freed@mrochek.com>; <ietf-charsets@iana.org>
Sent: Thursday, May 10, 2001 23:52
Subject: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
> At 14:33 09.05.2001 -0700, Mark Davis wrote:
>
> >Sorry, I missed that. Do you want me to resubmit, or could you just make
> >that change?
>
> Resubmit.
>
> note: each charset should have its own registration form.
>
> BTW, TR19 is technically broken in its definition of UTF-32: it specifies
> that an UTF-8 character stream MAY OR MAY NOT begin with a Byte Order
Mark,
> and that octets can be in any order.
>
> >D36c
> > (a) UTF-32 is the Unicode Transformation Format that serializes a
> > Unicode code point as a sequence of four bytes, in either big-endian or
> > little-endian format. An initial sequence corresponding to U+FEFF is
> > interpreted as a byte order mark: it is used to distinguish between the
> > two byte orders. The byte order mark is not considered part of the
> > content of the text. A serialization of Unicode code points into
> > UTF-32 may or may not begin with a byte order mark.
>
> This allows (when taking exquisite care - you only have 4.1 bits that are
> valid in both upper and lower halves of the 32-bit word) the construction
> of octet sequences that are ambiguous.
>
> If either the specification or the registration had said "A serialization
> of Unicode code points into UTF-32 that does not begin with a byte order
> mark MUST be in Big Endian", I would not have protested. But this is,
IMHO,
> just too broken to be registered as a charset.
>
> As written, I OPPOSE the registration of UTF-32.
> (Apologies for having missed it at Unicode standardization time - we saw
it
> coming, and did not catch it in time)
>
> Harald
>