[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
I approve the Registration of new charset BOCU-1
- To: iana@iana.org
- Subject: I approve the Registration of new charset BOCU-1
- From: Harald Tveit Alvestrand <harald@alvestrand.no>
- Date: Thu, 19 Sep 2002 19:15:46 -0700
- Cc: ietf-charsets@iana.org
- Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
- Spam-test: False ; 2.8 / 5.2
IANA,
please insert the enclosed registration into the charset registry.
Harald Alvestrand
---------- Forwarded Message ----------
Date: 16. september 2002 09:31 -0700
From: Markus Scherer <markus.scherer@jtcsv.com>
To: charsets <ietf-charsets@iana.org>
Subject: Registration of new charset BOCU-1 refreshed 2
Harald Tveit Alvestrand wrote:
> thanks - please send the revised BOCU-1 registration with the warning in
> it, so that I can pass it to IANA and say "I approve".
>
> I think this has been discussed enough now.
Here it is, the previous refresh from 2002-aug-22 with
your (Harald's) reformulation of Martin's warning about UTF-8 being the
preferred charset.
Thank you very much,
markus
---- 8< ----
Charset name: BOCU-1
Charset aliases: (none, except for the implicit csBOCU-1)
Suitability for use in MIME text: Yes
Published specifications:
Specification of BOCU-1 with sample code for conversion to/from
Unicode: http://www.unicode.org/notes/tn6/
Description of the general "BOCU" algorithm,
with a link to the BOCU-1 specification:
http://oss.software.ibm.com/icu/docs/papers/binary_ordered_compression_for_
unicode.html
A converter implementation that is conformant to this specification is
available in ICU (http://oss.software.ibm.com/icu/), an open-source
library. The BOCU-1 converter C source code is in
icu/source/common/ucnvbocu.c:
http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/common/ucnvbocu.c
CCS & CES: The BOCU-1 charset is a combination of the
Unicode/ISO 10646 Coded Character Set (CCS) with
the Character Encoding Scheme (CES) specified in
the above document. It covers exactly the
UTF-16-reachable subset of ISO 10646.
ISO 10646 equivalency table:
Algorithmic, see published specification and sample code.
Additional information:
BOCU-1 is an encoding (CES/TES) of Unicode/ISO 10646
for the storage and exchange of text data.
It is stateful and provides a good byte/code point ratio while
being directly usable in SMTP emails, database fields and other
contexts.
BOCU-1 combines the wide applicability of UTF-8 with the compactness
of SCSU. It is useful for short strings and maintains code point order.
BOCU-1 does not encode most ASCII characters with US-ASCII byte values.
BOCU-1 is intended for limited use in special situations
where the use of this charset can be preconfigured or negotiated.
The preferred and most widely supported encoding for
Unicode/ISO 10646 on the Internet is UTF-8.
There is a Unicode signature byte sequence defined
(FB EE 28, see specification).
BOCU-1 is suitable for
- databases: maintains Unicode code point order
- emails: directly suitable for MIME text
- CVS and similar: deterministic and resets at CR and LF
BOCU-1 is not suitable for
- efficient internal processing (convert to UTF-8/16/32)
- contexts where encoding declarations _in_ documents _must_ be
ASCII-readable
Person & email address to contact for further information:
Markus W. Scherer
IBM Globalization Center of Competency
5600 Cottle Road
Mail Stop: 50-2/B11
San Jose, CA 95193
USA
markus.scherer@jtcsv.com
markus.scherer@us.ibm.com
Intended usage: LIMITED USE
----
Suggested MIBenum value: 1020
(first available in Unicode/ISO 10646 range; like SCSU [which is 1011])
Thank you for your consideration,
markus
---------- End Forwarded Message ----------