[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Fwd: Last Call: UTF-16, an encoding of ISO 10646 to Proposed
At 10:25 16.12.99 -0800, Kenneth Whistler wrote:
> > - Inability to represent characters outside Planes 0-16
>
>WG2 and UTC are converging on a point of view that characters
>outside of Planes 0-16 should *never* be assigned. This may be
>formally written into 10646. The rationale here is that nearly
>all 10646 implementations are following the Unicode Standard, by
>necessity, to achieve interoperability in areas that are left
>unspecified by 10646. Formalizing this convergence by constraining
>the code space range that could ever be assigned standard characters
>would close down this nagging issue of incompatibility between
>the Unicode Standard and 10646. In that case, UTF-8, UTF-16, and
>UTF-32 would *all* have the exact same representational capability,
>and would all be completely interconvertible forms.
See http://www.unicode.org/pending/pending.html
It's entirely possible that all commonly used scripts will be encoded in
Plane 0 (if those who fight for traditional Chinese and more precomposed
characters give up), but I don't think it's likely that ISO will abandon
Plane 1.
> > - VERY bad expansion factor for characters outside Plane 0 (100% overhead)
>
>This claim I do not understand at all:
>
>scalar value UTF-8 UTF-16 UTF-32
>0..7F 1 2 4
>80..7FF 2 2 4
>800..FFFD 3 2 4
>10000..10FFFD 4 4 4
>
>The only size advantage for UTF-8 is for ASCII values, and UTF-16
>has the clear size advantage for East Asian data.
Yes. My mistake; I didn't count properly.
Harald
--
Harald Tveit Alvestrand, EDB Maxware, Norway
Harald.Alvestrand@edb.maxware.no