[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Fwd: Last Call: UTF-16, an encoding of ISO 10646 to Proposed

To: Kenneth Whistler <[email protected]>
Subject: RE: Fwd: Last Call: UTF-16, an encoding of ISO 10646 to Proposed
From: Harald Tveit Alvestrand <[email protected]>
Date: Thu, 16 Dec 1999 22:32:25 +0100
Cc: [email protected], [email protected], [email protected]
In-reply-to: <[email protected]>

At 10:25 16.12.99 -0800, Kenneth Whistler wrote:


> > - Inability to represent characters outside Planes 0-16
>
>WG2 and UTC are converging on a point of view that characters
>outside of Planes 0-16 should *never* be assigned. This may be
>formally written into 10646. The rationale here is that nearly
>all 10646 implementations are following the Unicode Standard, by
>necessity, to achieve interoperability in areas that are left
>unspecified by 10646. Formalizing this convergence by constraining
>the code space range that could ever be assigned standard characters
>would close down this nagging issue of incompatibility between
>the Unicode Standard and 10646. In that case, UTF-8, UTF-16, and
>UTF-32 would *all* have the exact same representational capability,
>and would all be completely interconvertible forms.

See http://www.unicode.org/pending/pending.html
It's entirely possible that all commonly used scripts will be encoded in 
Plane 0 (if those who fight for traditional Chinese and more precomposed 
characters give up), but I don't think it's likely that ISO will abandon 
Plane 1.


> > - VERY bad expansion factor for characters outside Plane 0 (100% overhead)
>
>This claim I do not understand at all:
>
>scalar value    UTF-8   UTF-16  UTF-32
>0..7F           1       2       4
>80..7FF 2       2       4
>800..FFFD       3       2       4
>10000..10FFFD   4       4       4
>
>The only size advantage for UTF-8 is for ASCII values, and UTF-16
>has the clear size advantage for East Asian data.

Yes. My mistake; I didn't count properly.

                      Harald

--
Harald Tveit Alvestrand, EDB Maxware, Norway
[email protected]

Follow-Ups:
- Re: Fwd: Last Call: UTF-16, an encoding of ISO 10646 to Proposed
  - From: MURATA Makoto <[email protected]>

References:
- RE: Fwd: Last Call: UTF-16, an encoding of ISO 10646 to Proposed
  - From: Kenneth Whistler <[email protected]>

Prev by Date: RE: Fwd: Last Call: UTF-16, an encoding of ISO 10646 to Proposed
Next by Date: RE: Fwd: Last Call: UTF-16, an encoding of ISO 10646 to Proposed
Prev by thread: RE: Fwd: Last Call: UTF-16, an encoding of ISO 10646 to Proposed
Next by thread: Re: Fwd: Last Call: UTF-16, an encoding of ISO 10646 to Proposed
Index(es):
- Date
- Thread