[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
ASN.1 debates about character set encoding
I enclose the following for your interest.
This E-mail originates from Bancroft Scott <baos@oss.oss.com>, who is
the editor of the ASN.1 standards.
They have currently about 3 ways of encoding ISO 10646:
- As a CHARACTER STRING, with an identifier saying which type it is
- As a BMPString, with 2 bytes per character (UCS-2)
- As a GeneralString, with escape sequences designating and invoking one
of the 5-7 registered ISO character sets for ISO 10646.
The proposed character set encoding method is not UTF-1 or UTF-2.
Harald T. Alvestrand
Outcome of discussions on ASN.1 support for 1 byte encoding of ISO 10646
strings, joint ITU-T | ISO meeting in Suresnes, France at the ISO ASN.1
editing meeting.
1. The character set experts at the meeting maintained that RFC 1502 does
solve the need for better MIME support for European languages in the
short term. Or to be more precise, they see a need for the likes
of RFC 1502, and they maintain that what it proposes is compatible with
how ASN.1 and BER should be used.
2. The sentiment was that RFC 1502 should consider use of the CHARACTER
STRING type instead of embedding the encoded material in an OCTET STRING.
3. There was much sentiment for the usefulness of being able to
encode an ISO 10646 character in a single octet, but no National Body
except the U.S. was willing to push forward with this idea (the U.S.
proposal gives all languages equal access in selecting which row is
used for the 1 octet encoding). The stated reason for not wanting
a single octet ISO 10646 encoding was that this is an SC2 matter and
they did not wish to be party to in effect undoing an SC2 resolution.
At the end of the day all National Bodies except the U.S. and France
abstained in voting on this matter (they maintained that it was a U.S.
/ France fight). France maintained its NO vote on the idea of having a
single octet per character encoding for ISO 10646 strings, and the U.S.
voted YES. Since there was a deadlock and the text was already in the
base document, the vote was resolved by remaining with the status quo -
so there is now no BER support for encoding ISO 10646 character
strings one octet per character.
4. All National Bodies at the meeting agreed that the Packed Encoding
Rules of ASN.1 should continue to support the encoding of ISO 10646
characters in the fewest number of bits needed to preserve the
semantics of each character.