[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Registration of new charset BOCU-1 refreshed 2
looks okay to me. Regards, Martin.
At 09:31 02/09/16 -0700, Markus Scherer wrote:
>Harald Tveit Alvestrand wrote:
>
> > thanks - please send the revised BOCU-1 registration with the warning in
> > it, so that I can pass it to IANA and say "I approve".
> >
> > I think this has been discussed enough now.
>
>Here it is, the previous refresh from 2002-aug-22 with
>your (Harald's) reformulation of Martin's warning about UTF-8 being the
>preferred charset.
>
>Thank you very much,
>markus
>
>---- 8< ----
>Charset name: BOCU-1
>
>Charset aliases: (none, except for the implicit csBOCU-1)
>
>Suitability for use in MIME text: Yes
>
>Published specifications:
> Specification of BOCU-1 with sample code for conversion to/from Unicode:
> http://www.unicode.org/notes/tn6/
>
> Description of the general "BOCU" algorithm,
> with a link to the BOCU-1 specification:
>
>http://oss.software.ibm.com/icu/docs/papers/binary_ordered_compression_for_
>unicode.html
>
> A converter implementation that is conformant to this specification is
> available in ICU (http://oss.software.ibm.com/icu/), an open-source
> library.
> The BOCU-1 converter C source code is in icu/source/common/ucnvbocu.c:
>
>http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/common/ucnvbocu.c
>
> CCS & CES: The BOCU-1 charset is a combination of the
> Unicode/ISO 10646 Coded Character Set (CCS) with
> the Character Encoding Scheme (CES) specified in
> the above document. It covers exactly the
> UTF-16-reachable subset of ISO 10646.
>
>ISO 10646 equivalency table:
> Algorithmic, see published specification and sample code.
>
>Additional information:
> BOCU-1 is an encoding (CES/TES) of Unicode/ISO 10646
> for the storage and exchange of text data.
> It is stateful and provides a good byte/code point ratio while
> being directly usable in SMTP emails, database fields and other contexts.
>
> BOCU-1 combines the wide applicability of UTF-8 with the compactness
> of SCSU.
> It is useful for short strings and maintains code point order.
> BOCU-1 does not encode most ASCII characters with US-ASCII byte values.
>
> BOCU-1 is intended for limited use in special situations
> where the use of this charset can be preconfigured or negotiated.
> The preferred and most widely supported encoding for
> Unicode/ISO 10646 on the Internet is UTF-8.
>
> There is a Unicode signature byte sequence defined
> (FB EE 28, see specification).
>
> BOCU-1 is suitable for
> - databases: maintains Unicode code point order
> - emails: directly suitable for MIME text
> - CVS and similar: deterministic and resets at CR and LF
>
> BOCU-1 is not suitable for
> - efficient internal processing (convert to UTF-8/16/32)
> - contexts where encoding declarations _in_ documents _must_ be
> ASCII-readable
>
>Person & email address to contact for further information:
> Markus W. Scherer
> IBM Globalization Center of Competency
> 5600 Cottle Road
> Mail Stop: 50-2/B11
> San Jose, CA 95193
> USA
>
> markus.scherer@jtcsv.com
> markus.scherer@us.ibm.com
>
>Intended usage: LIMITED USE
>
>----
>Suggested MIBenum value: 1020
> (first available in Unicode/ISO 10646 range; like SCSU [which is 1011])
>
>
>Thank you for your consideration,
>
>markus
>