[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registering a charset alias



On Sat, 15 Aug 2009 01:17:30 +0200, Erik van der Poel <erikv@google.com> wrote:
> No, I don't think we should recommend behavior that is more lenient
> than what the major browsers currently do. (I believe the major
> browsers don't strip "x-"?)

As far as I know that does not happen, indeed. I agree we should keep this to a minimum.


> So I don't think the following spec from HTML 5, section 2.7 is very
> good either:
>
> "When comparing a string specifying a character encoding with the name
> or alias of a character encoding to determine if they are equal, user
> agents must use the Charset Alias Matching rules defined in Unicode
> Technical Standard #22. [UTS22]
>
> For instance, "GB_2312-80" and "g.b.2312(80)" are considered equivalent  
> names."

Indeed. We experimented with this and it caused some compatibility issues.

What would help I think is clear documentation on how browsers (Chrome, Safari, Firefox, Internet Explorer, Opera) for a given label from set A arrive at the final label from set B. Set A is near-infinite and set B should be finite and essentially consist of the list of supported encodings. If we have that we should be able to propose a better algorithm than Unicode currently defines that HTML5 can then use.

Opera currently uses the Unicode Charset Alias Matching rules, but they are not perfect and we want to change away from them again. I'll look into getting a definitive list of the encodings we support.


> The general approach should be: As lenient as the major browsers, but
> not more lenient. Lenience leads to a proliferation of garbage.

Agreed, unless simplicity at no other cost can be greatly increased.


-- 
Anne van Kesteren
http://annevankesteren.nl/