[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ISO-10646-UCS-x aliases

To: charsets <ietf-charsets@iana.org>
Subject: Re: ISO-10646-UCS-x aliases
From: Markus Scherer <markus.scherer@jtcsv.com>
Date: Thu, 13 Nov 2003 10:23:22 -0800
In-reply-to: <F7D4BDA0E5A1D14B99D32C022AEB73660EB418@alis-2k.alis.domain>
Organization: IBM
Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
References: <F7D4BDA0E5A1D14B99D32C022AEB73660EB418@alis-2k.alis.domain>
Spam-test: False ; -2.1 / 4.5 ;EMAIL_ATTRIBUTION,IN_REP_TO,REFERENCES,REPLY_WITH_QUOTES,USER_AGENT_MOZILLA_UA,X_ACCEPT_LANG
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4)Gecko/20030624 Netscape/7.1 (ax)

Francois Yergeau wrote:
> Respecting UTF-16 vs ISO-10646-UCS-2 however, there is a real difference,
> the latter being restricted to U+FFFF.

Yes, there is a real difference. However, more often than not, "UCS-2" just means "byte 
serialization of the internal 16-bit Unicode/ISO 10646 form", and as the generating software 
upgrades to handle surrogate pairs, the text really is UTF-16. Also, most receiving software will 
byte-unserialize a "UCS-2" byte stream into 16-bit Unicode, and if it handles surrogate pairs, then 
interpret it as UTF-16 anyway. In other words, in practice, the difference between UCS-2 and UTF-16 
is in processing the text, not in encoding/converting it.

markus

-- 
Opinions expressed here may not reflect my company's positions unless otherwise noted.

References:
- RE: ISO-10646-UCS-x aliases
  - From: Francois Yergeau <FYergeau@alis.com>

Prev by Date: Bug on iana site
Next by Date: Re: Registration of new charset [Amiga-1251] - revision 2
Prev by thread: RE: ISO-10646-UCS-x aliases
Next by thread: Registration of new charset [Amiga-1251] - revision 2
Index(es):
- Date
- Thread