[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
internationalization/ISO10646 question
- To: ietf-charsets@iana.org
- Subject: internationalization/ISO10646 question
- From: Marcin Hanclik <mhanclik@poczta.onet.pl>
- Date: Fri, 22 Nov 2002 12:06:20 +0100
- Importance: Normal
- Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
- Spam-test: False ; 0.7 / 5.2
- Spam-test: False ; 0.7 / 5.2
Dear Sirs,
I am writing to you as to the experts in internationalization and ISO-10646
issues.
I would be very grateful if you could help me with the following issue
described below.
Generally the question refers to MIME encoding of text part.
Particularily to the following case:
Content-Type: text/plain; charset="iso-10646-ucs-2"
Content-Transfer-Encoding: ...
Data
Data after decoding: 0xFF 0xFE 0x66 0x00 0x65 0x00
Outlook Express decodes it to "fe" string. But there are people, who say
that this is robustness of Outlook Express and that the string is not
properly encoded, because in the time when <charset="iso-10646-ucs-2"> was
specified/assigned with IANA the byte order mark (BOM) did not exist.
This is why in detail:
My current knowledge on character encoding:
character set | transport (charset=,MIBenum)
---------------+-----------------------------------------------
UCS-2 | ISO-10646-UCS-2,1000 (network byteorder)
(Unicode 1.1) | (BOM does not exist)
---------------+-----------------------------------------------
UCS-2, UCS-4 | UTF-8,106 (endian independent)
| (BOM is not necessary but U+FEFF is acceptable)
---------------+-----------------------------------------------
UCS-2, UCS-4 | UTF-16,1015 (BOM or big endian)
---------------+-----------------------------------------------
UCS-2, UCS-4 | UTF-16BE,1013 (big endian)
| (BOM is not necessary but U+FEFF == 0xFE,0xFF is
acceptable)
---------------+-----------------------------------------------
UCS-2, UCS-4 | UTF-16LE,1014 (little endian)
| (BOM is not necessary but U+FEFF == 0xFF,0xFE is
acceptable)
---------------+-----------------------------------------------
Annex H to ISO-10646:2000 specifies a signature of the UCS used to identify
the data.
As I know, charset=ISO-10646-UCS-2(MIBenum 1000) was defined for
ISO-10646-1:1993 (Unicode 1.1) where BOM did not appear.
MY QUESTIONS:
1. Can I use charset=ISO-10646-UCS-2 parameter to describe data in
ISO-10646:2000 format with BOM?
2. Is it now so, that charset=ISO-10646-UCS-2 specifies ucs-2 from both
ISO-10646:2000 and ISO-10646:1993?
Thank you in advance for an answer.
Kind regards,
Marcin Hanclik