[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: internationalization/ISO10646 question

To: Chris Newman <Chris.Newman@sun.com>
Subject: Re: internationalization/ISO10646 question
From: MURATA Makoto <murata@hokkaido.email.ne.jp>
Date: Wed, 25 Dec 2002 11:51:06 +0900
Cc: Marcin Hanclik <mhanclik@poczta.onet.pl>, ietf-charsets@iana.org
In-reply-to: <2147483647.1039180421@nifty-jr.west.sun.com>
Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
References: <OLENIGGFKBOAIMPONAAJKEEPCDAA.mhanclik@poczta.onet.pl><2147483647.1039180421@nifty-jr.west.sun.com>
Spam-test: False ; -2.3 / 5.2

On Fri, 06 Dec 2002 13:13:41 -0800
Chris Newman <Chris.Newman@sun.com> wrote:

> 
> UTF-16 is a terrible encoding for interoperability.  There are 3 published 
> non-interoperable variants of UTF-16 (big-endian, little-endian, 
> BOM/switch-endian) and only one of the variants can be auto-detected with 
> any chance of success (and none of them can be auto-detected as well as 
> UTF-8). 

Unfortunately, as far as I know, UTF-8 is not free of such problems.
(1) With or without the Unicode signature, (2) possible confusion with other 
ASCII-compatible encodings (especially when a program has a few non-ASCII characters), 
(3) vulnerability caused by redundant octet sequences, and (4) use of 4 or 6 octets 
for non-BMP characters (e.g., writeUTF and readUTF of java.io.DataOutput).  I know 
that Corrigendum #1: UTF-8 Shortest Form addresses (3), but I am not sure if 
implementations are free of this vulnerability.

I would be very happy if some encoding of Unicode becomes free of interoperability 
or security problems.  But I am not happy yet.

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>

Follow-Ups:
- Re: internationalization/ISO10646 question
  - From: Chris Newman <Chris.Newman@Sun.COM>

References:
- RE: internationalization/ISO10646 question
  - From: Marcin Hanclik <mhanclik@poczta.onet.pl>
- RE: internationalization/ISO10646 question
  - From: Chris Newman <Chris.Newman@Sun.COM>

Prev by Date: Re: internationalization/ISO10646 question - UTF-16
Next by Date: draft-yergeau-rfc2279bis-02.txt for STANDARD
Prev by thread: Re: internationalization/ISO10646 question - UTF-16
Next by thread: Re: internationalization/ISO10646 question
Index(es):
- Date
- Thread