[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: internationalization/ISO10646 question



Hi,

But Unicode 3.2 (Unicode Standard Annex #28, March 2002) 
makes very clear in Table 3.1B "Legal UTF-8 Byte Sequences"
that there is _not_ a 6-byte UTF-8 representation of non-BMP 
characters.  

Also, section VIII "Relation to ISO/IEC 10646" of Unicode 3.2
describes ISO Amendment 1 to ISO/IEC 10646-1:2000, which
limits future ISO/IEC 10646 code point assignments to the 
range of UTF-16.

Therefore, UTF-8 is always the _same_ size (4 bytes) for 
non-BMP characters that both UTF-16 and UTF-32 are.

Cheers,
- Ira McDonald
  High North Inc


-----Original Message-----
From: MURATA Makoto [mailto:murata@hokkaido.email.ne.jp]
Sent: Thursday, January 02, 2003 8:11 PM
To: Chris Newman
Cc: Marcin Hanclik; ietf-charsets@iana.org
Subject: Re: internationalization/ISO10646 question


Chris,

> Is UTF-8 perfect?

We agree :-)

>No.  But the costs greatly outweight the benefits when 
> compared to any other charset I've seen, and particularly when compared to

> UTF-16.

I do not agree on this claim yet.  In particular, I'm concerned with the
6-byte 
representation of non-BMP characters.  When non-BMP characters become
common, 
what will happen?

Cheers,

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>