[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: internationalization/ISO10646 question



Hello Marcin,

You mentioned that the WAP spec said UTF-16, mibenum 1000
(the later being ISO-10646-UCS-2).

Why not assume that the mibenum was a mistake, and use charset=utf-16 ?

There is an RFC for utf-16, which contains very clear and detailled
rules about the BOM.

For iso-10646-ucs-2, the entry in
http://www.iana.org/assignments/character-sets has:

Name: ISO-10646-UCS-2
MIBenum: 1000
Source: the 2-octet Basic Multilingual Plane, aka Unicode
         this needs to specify network byte order: the standard
         does not specify (it is a 16-bit integer space)
Alias: csUnicode

It sounds like this is heavily underspecified. There are
other registrations that have similar problems. In general,
using UTF-16 (or UTF-16BE/UTF-16LE) is much better, because
it's up to date, covers the whole range of Unicode, and
is very well defined.

Regards,    Martin.



At 16:22 02/12/05 +0100, Marcin Hanclik wrote:
>Hi, Ned!
>
>Thanks a lot for the mail exchange. I have learned a lot.
>
>I would like to sum it up since I need a conclusion.
>
>I am trying to incorporate what You and Martin wrote in Your emails.
>The situation then looks like that:
>I have to send the UCS-2 encoded data. The headers will look like:
>
>Content-Type: application/x-my-text-subtype; charset="iso-10646-ucs-2"
>Content-Transfer-Encoding: BASE 64
>
>data
>
>My question was:
>Can the data marked as "iso-10646-ucs-2" contain BOM?
>
>Your answer was:
> > > I don't know if there are specific rules for handling revisions to
> > > iso-10646-ucs-2 or not. I suspect not. However, the general rule is that
> > > additions to a charset repetertoire are expected and allowed. See RFC
>2279
> > > section 3. However, the BOM is something of a special case.
> > > ....
> > > For material that isn't labelled with a top level content type of text I
>don't
> > > think the situation is clear, but the intent has always been to allow
> > > additions
> > > to charsets subsequent to registration. So I think BOM should be
>supported in
> > > this context.
>
>Wrong in the whole case is that top level content has text type, wrong is
>that WAP/MMS standards have produced a bug in their specs. But we have to
>live with them.
>
>Since Your answer is NOT CLEAR to me (I hope you agree that it can be...) I
>have to derive an answer from the above suggestions.
>But this is still not what I wanted. I would like to have:
>"New standard overrides the old one"
>   or
>"BOM was not defined in ISO10646:1993 and although new versions of ISO10646
>support BOM in UCS-2, data marked as iso-10646-ucs-2 cannot contain BOM"
>   instead of
>"BOM should be supported in this context"
>
>Is there any ultimate standard body that specified some rule for that case?
>Or can you help me further?
>
>Kind regards,
>Marcin
>
>--------------r-e-k-l-a-m-a-----------------
>
>Masz dosc placenia prowizji bankowi ?
>mBank - zaloz konto
>http://epieniadze.onet.pl/mbank