[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: internationalization/ISO10646 question
On Fri, 06 Dec 2002 13:13:41 -0800
Chris Newman <Chris.Newman@sun.com> wrote:
>
> UTF-16 is a terrible encoding for interoperability. There are 3 published
> non-interoperable variants of UTF-16 (big-endian, little-endian,
> BOM/switch-endian) and only one of the variants can be auto-detected with
> any chance of success (and none of them can be auto-detected as well as
> UTF-8).
Unfortunately, as far as I know, UTF-8 is not free of such problems.
(1) With or without the Unicode signature, (2) possible confusion with other
ASCII-compatible encodings (especially when a program has a few non-ASCII characters),
(3) vulnerability caused by redundant octet sequences, and (4) use of 4 or 6 octets
for non-BMP characters (e.g., writeUTF and readUTF of java.io.DataOutput). I know
that Corrigendum #1: UTF-8 Shortest Form addresses (3), but I am not sure if
implementations are free of this vulnerability.
I would be very happy if some encoding of Unicode becomes free of interoperability
or security problems. But I am not happy yet.
--
MURATA Makoto <murata@hokkaido.email.ne.jp>