[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Registration of new charset "UTF-16"
Larry Masinter wrote:
> I sent out a poll to the HTTP working group: are there two independent
> interoperable implementations of the HTTP 'exception' that send and
> process text types that don't use CR, LF, or CRLF for end of line?
> If we can't find two independent interoperable implementations, we
> may have to remove the 'feature' before we can progress HTTP/1.1 to
> Draft Standard.
As others have said, the Netscape and Alis clients support UCS-2, and MSIE
supports it to some degree too. (At least as far as end-of-line issues are
concerned, which are relatively trivial.)
Netscape looks for the HTTP charset parameter, and recognizes the following
UCS-2-related charset names:
ISO-10646-UCS-2
csUnicode11
ISO-10646-UCS-BASIC
csUnicodeASCII
ISO-10646-Unicode-Latin1
csUnicodeLatin1
ISO-10646
ISO-10646-J-1
The first one is the "main" one. Do Alis and MS use these names too?
If there is no HTTP charset, we try to detect UCS-2 by looking for 0xFEFF and
0xFFFE (little-endian). An early implementation looked for zero bytes, but
this was unreliable since some people (Gopher, if I remember correctly)
actually use zero bytes in non-UCS-2 text.
It might be a good idea to do some more extensive UCS-2 interoperability
testing, including charset name testing, and end-of-line testing. Sounds like
Makoto has already done some testing.
Erik