[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC 2279 (UTF-8) to Full Standard

To: [email protected]
Subject: Re: RFC 2279 (UTF-8) to Full Standard
From: Kenneth Whistler <[email protected]>
Date: Mon, 29 Apr 2002 12:43:13 -0700 (PDT)
Cc: [email protected], [email protected]

Dan,

> >And the repeated concerns about the "eventual allocation" of characters
> >in the 32-bit codespace that UTF-16 could not handle have reached
> >the status of urban legends -- endlessly repeated among those in the
> >Linux community who use repetition to define accuracy, without bothering
> >to check with the source.
> 
> I am sure UTF-16 could be expanded with an other surrogate space to
> handle all of original UCS (all 31 bits).

But why? Where is the necessity?

> I general I think is is wrong
> to restrict the available 31 bits of UCS into the UTF-16 space just
> because Unicode did the wrong choice from the beginning by using
> only 16 bits. UTF-8 can encode much more than UTF-16 code space.

This has the lingering quality of a religious or aesthetic argument.
Why is more better when there is no need for more?

If there are no alligators in the sewers, why spend money on
designing alligator traps and installing them in all the manholes?

> Though UTF-16 programs will not be able to handle all of them.
> It is no different from me using a 8-bit code space having to encode
> or discard all character outside code values 0-255.

I presume you meant to write 0-127. ;-)

If there were only 35 characters, and nobody could find any more,
and you were using an 8-bit code space, and the architecture of
the encoding forms limited that to the code values 0-127,
would you feel unnecessarily constrained? Does the "wastefulness"
of throwing away unused bits bother you that much?

Why do you *need* all 31 bits of UCS? For that matter, what about
that wasteful reservation of the 32nd bit in UCS? That eliminates
2 billion+ code values. Why not rail against that restriction, too?
That is actually more reminiscent of the 7-bit/8-bit issue, which
was also a signed/unsigned byte issue.

--Ken

> 
>    Dan
> 
>

Prev by Date: RE: Fixing redirects for 'character-sets' directory
Next by Date: Re: Honey - this was a virus
Prev by thread: Re: RFC 2279 (UTF-8) to Full Standard
Next by thread: Registration of new charset GBK
Index(es):
- Date
- Thread