[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registration of new charset CP50220



On Tue, 31 Aug 2010 04:53:55 +0200, Martin J. Dürst  
<duerst@it.aoyama.ac.jp> wrote:
> - If what we need (for HTML5, as far as I understand) isn't exactly
>    what Windows software is doing, then we should not use the name
>    CP50220 for the registration, but should come up with some other
>    name. But the origin of strange provisions such as "treat content
>    labeled as iso-8859-1 as if it were windows-1252" in HTML5 are
>    "because IE did so". So the browsers might as well follow IE exactly,
>    not just almost, in which case, we could use the name CP50220.

To be clear. We do not need it for HTML5 specifically. Browsers need this  
kind of information in general so they can do the same thing for legacy  
content. New browsers entering the market also need this information so  
they do not have to reverse engineer the market leader (as we are  
currently doing).


> - The charset registry currently has no way to express "On creation
>    (encoding), limited to 'foo', but on interpretation (decoding), also
>    take into account 'bar'.". RFC 2978 defines a 'charset' as "a method
>    of converting a sequence of octets into a sequence of characters".
>    We may be able to deal with this by adding comments, but maybe in the
>    long term, this could be a change needed in an update to the RFC.

It is my understanding it can also apply to creation. Consider submitting  
a form. I believe browsers treat e.g. ISO-8859-1 as Windows-1252 there.  
Yui mentioned this case specifically when submitting this registration.


-- 
Anne van Kesteren
http://annevankesteren.nl/