[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Registering a charset alias



I'd encourage moving toward UTF-8.  Assuming that existing stuff works OK, then adding "better" ways for non-Unicode data to be passed around doesn't accomplish much.  Anyone working to use the "better" method could just as easily (or more easily) just move to UTF-8.

-Shawn

-----Original Message-----
From: Erik van der Poel [mailto:erikv@google.com]
Sent: Friday, August 14,  2009 16:18
To: Markus Scherer
Cc: Shawn Steele; Ira McDonald; Anne van Kesteren; ietf-charsetsianaorg
Subject: Re: Registering a charset alias

No, I don't think we should recommend behavior that is more lenient
than what the major browsers currently do. (I believe the major
browsers don't strip "x-"?)

So I don't think the following spec from HTML 5, section 2.7 is very
good either:

"When comparing a string specifying a character encoding with the name
or alias of a character encoding to determine if they are equal, user
agents must use the Charset Alias Matching rules defined in Unicode
Technical Standard #22. [UTS22]

For instance, "GB_2312-80" and "g.b.2312(80)" are considered equivalent names."

The general approach should be: As lenient as the major browsers, but
not more lenient. Lenience leads to a proliferation of garbage.

I chose the name x-x-big5 for an internal X-Windows-only encoding for
Big5 that only had 2-byte characters, no ASCIIs. That name was
intended to be used only internally, within Netscape, but our X
resource file was written in plain ASCII, and Microsoft picked that
up, assuming that x-x-big5 was ordinary big5. I shouldn't have exposed
that name in a plain text file, nor should I have put that name in the
same namespace as ordinary charsets.

Erik

On Fri, Aug 14, 2009 at 4:03 PM, Markus Scherer<markus.icu@gmail.com> wrote:
> How about a general rule (maybe in HTML 5) that if x-abc is not recognized
> then the implementation should strip the x- and try abc instead. Apparently,
> this may need to be done multiple times, to deal with x-x-big5. (Whoever
> came up with *that*?)
>
> markus
>