[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
ASN.1 debates about character set encoding
> They have currently about 3 ways of encoding ISO 10646:
>
> - As a CHARACTER STRING, with an identifier saying which type it is
> - As a BMPString, with 2 bytes per character (UCS-2)
> - As a GeneralString, with escape sequences designating and invoking one
> of the 5-7 registered ISO character sets for ISO 10646.
The above is not quite correct. It should read:
The new issue of ASN.1 and BER has 2 ways of encoding ISO
10646 strings:
- As a UniversalString, with 4 bytes per character (UCS-4)
- As a BMPString, with 2 bytes per character (UCS-2)
The type CHARACTER STRING is a general mechanism for carrying values
of any character set, including ISO 10646. For instance, one could
use the approach documented in RFC 1502 to encode an ISO 10646 string
using GeneralString, with escape sequences designating and invoking
one of the 5-7 registered ISO character sets for ISO 10646, then embed
this string within a CHARACTER STRING type which typically results in
a one octet identifier prefix within BER.
Bancroft Scott