[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ASN.1 debates about character set encoding



I enclose the following for your interest.

This E-mail originates from Bancroft Scott <baos@oss.oss.com>, who is
the editor of the ASN.1 standards.
They have currently about 3 ways of encoding ISO 10646:

- As a CHARACTER STRING, with an identifier saying which type it is
- As a BMPString, with 2 bytes per character (UCS-2)
- As a GeneralString, with escape sequences designating and invoking one
  of the 5-7 registered ISO character sets for ISO 10646.

The proposed character set encoding method is not UTF-1 or UTF-2.

                 Harald T. Alvestrand


Outcome of discussions on ASN.1 support for 1 byte encoding of ISO 10646
strings, joint ITU-T | ISO meeting in Suresnes, France at the ISO ASN.1
editing meeting.

1. The character set experts at the meeting maintained that RFC 1502 does
   solve the need for better MIME support for European languages in the
   short term.  Or to be more precise, they see a need for the likes
   of RFC 1502, and they maintain that what it proposes is compatible with
   how ASN.1 and BER should be used.

2. The sentiment was that RFC 1502 should consider use of the CHARACTER 
   STRING type instead of embedding the encoded material in an OCTET STRING.

3. There was much sentiment for the usefulness of being able to
   encode an ISO 10646 character in a single octet, but no National Body
   except the U.S. was willing to push forward with this idea (the U.S.
   proposal gives all languages equal access in selecting which row is
   used for the 1 octet encoding).  The stated reason for not wanting
   a single octet ISO 10646 encoding was that this is an SC2 matter and 
   they did not wish to be party to in effect undoing an SC2 resolution.
   At the end of the day all National Bodies except the U.S. and France 
   abstained in voting on this matter (they maintained that it was a U.S. 
   / France fight). France maintained its NO vote on the idea of having a 
   single octet per character encoding for ISO 10646 strings, and the U.S. 
   voted YES. Since there was a deadlock and the text was already in the 
   base document, the vote was resolved by remaining with the status quo -
   so there is now no BER support for encoding ISO 10646 character
   strings one octet per character.

4. All National Bodies at the meeting agreed that the Packed Encoding
   Rules of ASN.1 should continue to support the encoding of ISO 10646
   characters in the fewest number of bits needed to preserve the 
   semantics of each character.