[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE

To: Mark Davis <mark.davis@us.ibm.com>
Subject: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
From: Harald Tveit Alvestrand <harald@alvestrand.no>
Date: Fri, 11 May 2001 08:52:59 +0200
Cc: ned.freed@mrochek.com, ietf-charsets@iana.org
In-reply-to: <OF0DE82293.930B5208-ON88256A47.00766372@rchland.ibm.com>

At 14:33 09.05.2001 -0700, Mark Davis wrote:

>Sorry, I missed that. Do you want me to resubmit, or could you just make
>that change?

Resubmit.

note: each charset should have its own registration form.

BTW, TR19 is technically broken in its definition of UTF-32: it specifies 
that an UTF-8 character stream MAY OR MAY NOT begin with a Byte Order Mark, 
and that octets can be in any order.

>D36c
>        (a) UTF-32 is the Unicode Transformation Format that serializes a 
> Unicode code point as a sequence of four bytes, in either big-endian or 
> little-endian format. An initial sequence corresponding to U+FEFF is 
> interpreted as a byte order mark: it is used to distinguish between the 
> two byte orders. The byte order mark is not considered part of the 
> content of    the text. A serialization of Unicode code points into 
> UTF-32 may or may not begin with a byte order mark.

This allows (when taking exquisite care - you only have 4.1 bits that are 
valid in both upper and lower halves of the 32-bit word) the construction 
of octet sequences that are ambiguous.

If either the specification or the registration had said "A serialization 
of Unicode code points into UTF-32 that does not begin with a byte order 
mark MUST be in Big Endian", I would not have protested. But this is, IMHO, 
just too broken to be registered as a charset.

As written, I OPPOSE the registration of UTF-32.
(Apologies for having missed it at Unicode standardization time - we saw it 
coming, and did not catch it in time)

                Harald

Follow-Ups:
- Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
  - From: Mark Davis <markdavis34@home.com>

References:
- Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
  - From: Mark Davis <mark.davis@us.ibm.com>

Prev by Date: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
Next by Date: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
Prev by thread: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
Next by thread: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE
Index(es):
- Date
- Thread