[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: internationalization/ISO10646 question

To: MURATA Makoto <murata@hokkaido.email.ne.jp>
Subject: Re: internationalization/ISO10646 question
From: Chris Newman <Chris.Newman@Sun.COM>
Date: Fri, 03 Jan 2003 17:56:59 -0800
Cc: Marcin Hanclik <mhanclik@poczta.onet.pl>, ietf-charsets@iana.org
In-reply-to: <20030103103915.3B3B.MURATA@hokkaido.email.ne.jp>
Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
References: <20021225113735.8C21.MURATA@hokkaido.email.ne.jp><2147483647.1041520511@nifty-jr.west.sun.com><20030103103915.3B3B.MURATA@hokkaido.email.ne.jp>
Spam-test: False ; -102.3 / 5.2

begin quotation by MURATA Makoto on 2003/1/3 11:11 +0900:
> I do not agree on this claim yet.  In particular, I'm concerned with the
> 6-byte  representation of non-BMP characters.  When non-BMP characters
> become common,  what will happen?

Software which is fully UTF-8 native will likely work just fine.  UTF-8 
aware software already has support for variable width characters, whether 
it is 2, 3, 4, 5 or 6 octets in the variable width character, the code path 
used should be the same and will have already been tested.

Software which converts UTF-8 to UCS-2 will break completely.  There may be 
more of this junk out there than one might hope.

Software which converts UTF-8 to UTF-16 may not work because a lot of 
UTF-16 software has never been tested with variable-width characters.

That's actually the most serious flaw in UTF-16.  It's a variable width 
encoding, but the variable width characters are an uncommon case 
(currently).  That means all the code to support non-16 bit characters in 
UTF-16 is an uncommon case and those codepaths haven't been tested (if they 
exist).  Thus you can expect deployed UTF-16 based software to break in 
various ways as non-BMP characters show up.

Unfortunately, I'm afraid the majority of software will fall in the latter 
two categories.

                - Chris

Follow-Ups:
- Re: internationalization/ISO10646 question
  - From: Markus Scherer <markus.scherer@jtcsv.com>

References:
- Re: internationalization/ISO10646 question
  - From: MURATA Makoto <murata@hokkaido.email.ne.jp>
- Re: internationalization/ISO10646 question
  - From: Chris Newman <Chris.Newman@Sun.COM>
- Re: internationalization/ISO10646 question
  - From: MURATA Makoto <murata@hokkaido.email.ne.jp>

Prev by Date: RE: internationalization/ISO10646 question
Next by Date: Re: internationalization/ISO10646 question
Prev by thread: Re: internationalization/ISO10646 question
Next by thread: Re: internationalization/ISO10646 question
Index(es):
- Date
- Thread