[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: internationalization/ISO10646 question



> Dear Sirs,

> I am writing to you as to the experts in internationalization and ISO-10646
> issues.

> I would be very grateful if you could help me with the following issue
> described below.

> Generally the question refers to MIME encoding of text part.
> Particularily to the following case:
> Content-Type: text/plain; charset="iso-10646-ucs-2"
> Content-Transfer-Encoding: ...

This, I'm afraid, is an illegal combination of elements. Specifically, any
material with a top level media type of "text" has to represent carriage
return/line feed as the literal sequence 0x13 0x10. iso-10646-ucs-2 clearly
does not do this, and as such is a media type that's not suited for use with
MIME text. 

This requirement is spelled out in RFC 2046 section 4.1.1.

> Data

> Data after decoding: 0xFF 0xFE 0x66 0x00 0x65 0x00

> Outlook Express decodes it to "fe" string. But there are people, who say
> that this is robustness of Outlook Express and that the string is not
> properly encoded, because in the time when <charset="iso-10646-ucs-2"> was
> specified/assigned with IANA the byte order mark (BOM) did not exist.

I don't know if there are specific rules for handling revisions to
iso-10646-ucs-2 or not. I suspect not. However, the general rule is that
additions to a charset repetertoire are expected and allowed. See RFC 2279
section 3. However, the BOM is something of a special case.

But given the far more egregious violation going on here I really don't
think this is particular important in the overall scheme of things.

				Ned