[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: GSM 03.38 substitution character?





> -----Original Message-----
> From: Markus Scherer [mailto:markus.icu@gmail.com] 
> Sent: Friday, November 03, 2006 1:01 AM
> To: Frank Ellermann
> Cc: ietf-charsets@mail.apps.ietf.org
> Subject: Re: GSM 03.38 substitution character?
> 
> 
> On 11/2/06, Frank Ellermann <nobody@xyzzy.claranet.de> wrote:
> > http://www.3gpp.org/ftp/Specs/archive/03_series/03.38/0338-720.zip
> > I can't really read it, it's a *.doc, but it might be what you want.
> 
> Thanks, this is great!
> 
> I can't find any reference to a substitution character (searching for
> "subs" and "repl" and reading portions of the text). I guess the
> intention is for a sender to switch to UCS-2 if the text can't be
> represented in the "default alphabet". In which case any value should
> do for a conversion table because the conversion should really stop on
> unmappable characters. If anyone can find something that I am not
> seeing, please let me know.
> 
> There are a couple of places where the unicode.org table seems to
> differ from this text (0338-720.doc): (In addition to the ç mentioned
> in the unicode.org table.)
> 
> 1. A 0x1B alone should be a space (one-way to-Unicode) but the Unicode
> GSM table has it round-trip to U+00A0 NBSP. (The text says "This code
> is an escape to an extension of the 7 bit default alphabet table. A
> receiving entity which does not understand the meaning of this escape
> mechanism shall display it as a space character.")

An NBSP *displays* as a space... And does not cause spurious line
breaks.

> 2. The pair 0x1B+0x1B should also map to a space (one-way to-Unicode)
> but the Unicode GSM table does not include this combination. (The text
> says "This code value is reserved for the extension to another
> extension table. On receipt of this code, a receiving entity shall
> display a space until another extension table is defined.")

In my "original" table I had (as comments):

#0x1B       # DBCS LEAD BYTE (may be two in a row)
#0x1B1B     # Double lead byte (which may lead to a secondary 7-bit
extension)

I think the first one was turned into a mapping to NBSP and the second
one
just removed in order to maintain roundtripability for the mapping
(apart
from the non-linebreakness of NBSP).

	/kent k


> Thanks,
> markus
>