[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Charset policy



I have two comments:

1. POSIX. There is an inherent contradiction between Unicode and POSIX.
POSIX as it is today allows one to set one's own preferences for several
attributes of the characters, while in Unicode they are fixed.

In the context of a universal character set, I believe the Unicode approach
is the only one feasible.

Other parts of the locale, where cultural preferences such as date format
are specified, should, of course, still remain.

A possible solution is for POSIX to specify that for a UCS certain
character attributes are fixed and are no longer locale dependent.

2. UTF. For some reason I was under the impression that UTF was a temporary
expedient, until the communication protocols are comfortable with 16 or 32
bit characters. The charset policy has it the other way round.

I believe we should be moving faster to just using 16 or 32 bit characters,
now that 7 bit communications are no longer dominant.

To those worried about bandwidth, I will say that most modems today include
compression, and your 8 bit characters are in effect compressed down to 4,
3 and sometimes even 2 bits. 16 or 32 bit characters will not be noticeably
worse off. Maybe the modem compression schemes could later on be adapted to
16 or 32 bit characters and become even better. In any case, the various
UTF schemes are not very impressive as far as compression goes, even for US
ASCII.



--

Jonathan Rosenne
JR Consulting
P O Box 33641, Tel Aviv, Israel
Phone: +972 50 246 522 Fax: +972 9 956 7353
http://ourworld.compuserve.com/homepages/Jonathan_Rosenne/