[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registration of new charset BOCU-1



Hello Markus,

Two comments:

At 10:34 02/07/09 -0700, Markus Scherer wrote:
>(This is a proposal for a registration; I am using the template from RFC 
>2978.)
>
>Charset name: BOCU-1
>
>Charset aliases: (none, except for the implicit csBOCU-1)
>
>Suitability for use in MIME text: Yes
>
>Published specifications:

>     CCS & CES: The BOCU-1 charset is a combination of the
>     Unicode and ISO 10646 Coded Character Set (CCS)

'combination' sounds very strange. The CCS of Unicode and
ISO 10646 is identical by design, but 'combination' suggests
that BOCU-1 has brought them together.


>with
>     the Character Encoding Scheme (CES) specified in
>     the above document. It covers exactly the
>     UTF-16-reachable subset of ISO 10646.
>
>ISO 10646 equivalency table:
>     Algorithmic, see published specification and sample code.
>
>Additional information:

Given that you (correctly, in my view) say "Intended usage: LIMITED USE",
I would just cut out all of this, because there is no need for marketing.
I assume it's all documented in the spec that you have already cited.


Regards,     Martin.

>     BOCU-1 is an encoding (CES/TES) of Unicode/ISO 10646
>     for the storage and exchange of text data.
>     It is stateful and provides a good byte/code point ratio while
>     being directly usable in SMTP emails, database fields and other contexts.
>
>     BOCU-1 combines the wide applicability of UTF-8 with the compactness 
> of SCSU.
>     It is useful for short strings and maintains code point order.
>
>     BOCU-1 does not encode most ASCII characters with US-ASCII byte values.
>
>     There is a Unicode signature byte sequence defined
>     (FB EE 28, see specification).
>
>     BOCU-1 is suitable for
>     - databases: maintains Unicode code point order
>     - emails: directly suitable for MIME text
>     - CVS and similar: deterministic and resets at CR and LF
>
>     BOCU-1 is not suitable for
>     - efficient internal processing (convert to UTF-8/16/32)
>     - contexts where encoding declarations _in_ documents _must_ be 
> ASCII-readable
>
>Person & email address to contact for further information:
>     Markus W. Scherer
>     IBM Globalization Center of Competency
>     5600 Cottle Road
>     Mail Stop: 50-2/B11
>     San Jose, CA 95193
>     USA
>
>     markus.scherer@jtcsv.com
>     markus.scherer@us.ibm.com
>
>Intended usage: LIMITED USE
>
>----
>Suggested MIBenum value: 1020
>     (first available in Unicode/ISO 10646 range; like SCSU [which is 1011])
>
>
>Thank you for your consideration,
>
>markus
>