[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registering a charset alias



No, I don't think we should recommend behavior that is more lenient
than what the major browsers currently do. (I believe the major
browsers don't strip "x-"?)

So I don't think the following spec from HTML 5, section 2.7 is very
good either:

"When comparing a string specifying a character encoding with the name
or alias of a character encoding to determine if they are equal, user
agents must use the Charset Alias Matching rules defined in Unicode
Technical Standard #22. [UTS22]

For instance, "GB_2312-80" and "g.b.2312(80)" are considered equivalent names."

The general approach should be: As lenient as the major browsers, but
not more lenient. Lenience leads to a proliferation of garbage.

I chose the name x-x-big5 for an internal X-Windows-only encoding for
Big5 that only had 2-byte characters, no ASCIIs. That name was
intended to be used only internally, within Netscape, but our X
resource file was written in plain ASCII, and Microsoft picked that
up, assuming that x-x-big5 was ordinary big5. I shouldn't have exposed
that name in a plain text file, nor should I have put that name in the
same namespace as ordinary charsets.

Erik

On Fri, Aug 14, 2009 at 4:03 PM, Markus Scherer<markus.icu@gmail.com> wrote:
> How about a general rule (maybe in HTML 5) that if x-abc is not recognized
> then the implementation should strip the x- and try abc instead. Apparently,
> this may need to be done multiple times, to deal with x-x-big5. (Whoever
> came up with *that*?)
>
> markus
>