[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: internationalization/ISO10646 question
Hi,
But Unicode 3.2 (Unicode Standard Annex #28, March 2002)
makes very clear in Table 3.1B "Legal UTF-8 Byte Sequences"
that there is _not_ a 6-byte UTF-8 representation of non-BMP
characters.
Also, section VIII "Relation to ISO/IEC 10646" of Unicode 3.2
describes ISO Amendment 1 to ISO/IEC 10646-1:2000, which
limits future ISO/IEC 10646 code point assignments to the
range of UTF-16.
Therefore, UTF-8 is always the _same_ size (4 bytes) for
non-BMP characters that both UTF-16 and UTF-32 are.
Cheers,
- Ira McDonald
High North Inc
-----Original Message-----
From: MURATA Makoto [mailto:murata@hokkaido.email.ne.jp]
Sent: Thursday, January 02, 2003 8:11 PM
To: Chris Newman
Cc: Marcin Hanclik; ietf-charsets@iana.org
Subject: Re: internationalization/ISO10646 question
Chris,
> Is UTF-8 perfect?
We agree :-)
>No. But the costs greatly outweight the benefits when
> compared to any other charset I've seen, and particularly when compared to
> UTF-16.
I do not agree on this claim yet. In particular, I'm concerned with the
6-byte
representation of non-BMP characters. When non-BMP characters become
common,
what will happen?
Cheers,
--
MURATA Makoto <murata@hokkaido.email.ne.jp>