[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GSM 03.38 substitution character?



On 11/2/06, Kent Karlsson <kent.karlsson14@comhem.se> wrote:
> > There are a couple of places where the unicode.org table seems to
> > differ from this text (0338-720.doc): (In addition to the ç mentioned
> > in the unicode.org table.)
> >
> > 1. A 0x1B alone should be a space (one-way to-Unicode) but the Unicode
> > GSM table has it round-trip to U+00A0 NBSP. (The text says "This code
> > is an escape to an extension of the 7 bit default alphabet table. A
> > receiving entity which does not understand the meaning of this escape
> > mechanism shall display it as a space character.")
>
> An NBSP *displays* as a space... And does not cause spurious line
> breaks.

True, but the standard only _says_ "space". Even if we choose NBSP for
the mapping, I don't think it's warranted to list 0x1B=U+00A0 as a
regular (round-trip) mapping. From my reading of the text, it should
be a one-way, to-Unicode mapping. Right?

> > 2. The pair 0x1B+0x1B should also map to a space (one-way to-Unicode)
> > but the Unicode GSM table does not include this combination. (The text
> > says "This code value is reserved for the extension to another
> > extension table. On receipt of this code, a receiving entity shall
> > display a space until another extension table is defined.")
>
> In my "original" table I had (as comments):
>
> #0x1B       # DBCS LEAD BYTE (may be two in a row)
> #0x1B1B     # Double lead byte (which may lead to a secondary 7-bit
> extension)
>
> I think the first one was turned into a mapping to NBSP and the second
> one
> just removed in order to maintain roundtripability for the mapping
> (apart
> from the non-linebreakness of NBSP).

Yes, that's what is in the table on unicode.org. I don't think
roundtrippability applies here because the standard just talks about
clients interpreting byte streams with 0x1B+0x1B if they don't know
about a sub-extension -- that is, only in conversion to Unicode as a
fallback. It seems like 0x1B+0x1B would want a one-way mapping to the
same code point as 0x1B alone (be that SP or NBSP).

markus
-- 
Opinions expressed here may not reflect my company's positions unless
otherwise noted.