[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ISO-10646-UCS-x aliases
- To: charsets <[email protected]>
- Subject: Re: ISO-10646-UCS-x aliases
- From: Markus Scherer <[email protected]>
- Date: Thu, 13 Nov 2003 10:23:22 -0800
- In-reply-to: <[email protected]>
- Organization: IBM
- Original-recipient: rfc822;[email protected]
- References: <[email protected]>
- Spam-test: False ; -2.1 / 4.5 ;EMAIL_ATTRIBUTION,IN_REP_TO,REFERENCES,REPLY_WITH_QUOTES,USER_AGENT_MOZILLA_UA,X_ACCEPT_LANG
- User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4)Gecko/20030624 Netscape/7.1 (ax)
Francois Yergeau wrote:
> Respecting UTF-16 vs ISO-10646-UCS-2 however, there is a real difference,
> the latter being restricted to U+FFFF.
Yes, there is a real difference. However, more often than not, "UCS-2" just means "byte
serialization of the internal 16-bit Unicode/ISO 10646 form", and as the generating software
upgrades to handle surrogate pairs, the text really is UTF-16. Also, most receiving software will
byte-unserialize a "UCS-2" byte stream into 16-bit Unicode, and if it handles surrogate pairs, then
interpret it as UTF-16 anyway. In other words, in practice, the difference between UCS-2 and UTF-16
is in processing the text, not in encoding/converting it.
markus
--
Opinions expressed here may not reflect my company's positions unless otherwise noted.