[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registration of new charset BOCU-1



The significant advantages of BOCU-1 over SCSU are:

- MIME compatibility (unless you don't think that is important ;-)
- binary order preservation: this is valuable wherever the binary order must be maintained, and is not true of SCSU.

What binary order preservation means is that:

If you take any two UTF-8 strings X and Y, and compress them with BOCU-1 to X' and Y',
X < Y if and only if X' < Y'.

Mark
___
mark.davis@us.ibm.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

Harald Tveit Alvestrand <harald@alvestrand.no>




          Harald Tveit Alvestrand <harald@alvestrand.no>

          2002.07.24 19:47



To: Markus Scherer <markus.scherer@jtcsv.com>, charsets <ietf-charsets@iana.org>
cc:
Subject: Re: Registration of new charset BOCU-1




--On 23. juli 2002 15:17 -0700 Markus Scherer <markus.scherer@jtcsv.com>
wrote:

> BOCU-1 was not created and proposed for registration to then be
> discouraged, but to encourage users to use a Unicode encoding when they
> would otherwise choose a legacy encoding just for its compactness (aside
> from database applications).
>
> As I said before, the SCSU registration has a similar list of features,
> and no one thought it unwise then. It is much easier for someone to
> figure out if a charset is appropriate for some use if one need not
> follow a URL.
>
> I made this argument two weeks ago and there was no response at all, so I
> assumed that this was all acceptable.
>
> I would like to ask again, What do others think?
> What does the approver think?

1) The approver (that's me) agrees with RFC 2278:

3.5. Usage and Implementation Requirements

Use of a large number of charsets in a given protocol may hamper
interoperability. However, the use of a large number of undocumented
and/or unlabelled charsets hampers interoperability even more.

A charset should therefore be registered ONLY if it adds significant
functionality that is valuable to a large community, OR if it
documents existing practice in a large community. Note that charsets
registered for the second reason should be explicitly marked as being
of limited or specialized use and should only be used in Internet
messages with prior bilateral agreement.

The approver has a hard time seeing that the added value over SCSU and
UTF-8 is enough to be "significant". Are there applications (not toolkits)
today that have committed to converting to use of BOCU-1? Is the Unicode
Consortium considering adding BOCU-1 to its specifications?

2) The approver agrees with the submitter that putting the usability
information into the registration is probably a Good Thing.

Harald



GIF image