[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Registration of new charset BOCU-1 refreshed - UTF-8
--On mandag, august 26, 2002 15:31:49 -0700 Markus Scherer
<markus.scherer@jtcsv.com> wrote:
>> Having a proliferation of Unicode
>> encodings is about as problematic as having a proliferation of
>> legacy encodings.
>
>
> Respectfully, I would like to disagree on this point.
> The use of non-Unicode charsets opens a whole different, huge Pandora's
> box of problems, which are well described in Unicode TR 22 and the XML
> Japanese Profile.
>
> All Unicode charsets are easily decoded in relatively small and fast code
> (even SCSU and BOCU-1), without any confusion about what Unicode code
> point any byte sequence maps to. Mapping tables for non-Unicode charsets
> can be large - e.g., ICU's standard set uses about 5MB of data, while
> there is 0 for Unicode charsets.
Remember that we have zero (none, nada, nil, zilch) generally supported
ways of figuring out what charsets the recipient of an email supports.
Thus, the first email client that is capable of supporting BOCU-1 will be
capable of sending mail that no other email client in the world can display
legibly, and *has no way of knowing when they become capable of doing so*.
And this is only one of many places where one uses charsets in protocols.
I think you should add Martin's warning to your registration - possibly
reformulated as follows (line 2 added):
BOCU-1 is intended for limited use in special situations
where the use of this charset can be preconfigured or negotiated.
The preferred and most widely supported encoding for
Unicode/ISO 10646 on the Internet is UTF-8.
OK?
Harald