[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Indicating charset variants (was: RE: windows 936)



On 9/22/07, Martin Duerst <duerst@it.aoyama.ac.jp> wrote:
> At 22:52 07/09/21, Erik van der Poel wrote:
> >I don't think it's such a good idea. The Web has come a long way in
> >terms of labelling charsets. In the early days, very few people
> >bothered to insert the HTML <meta> with charset, and even fewer people
> >inserted the HTTP charset. Nowadays, around 74% of the documents in
> >Google's index have the meta charset.
>
> Even if some percentage of these is wrong (do you have any idea?),
> that's definitely a lot of progress.

I believe there are shades of gray between "wrong" and right, partly
because of the variant issue and partly because the major browsers are
not 100% consistent with each other, with the IANA registry or with
the official and semi-official Unicode mapping tables.

I suspect that a large percentage of the meta and http charsets is
"correct", at least to the extent of displaying the more common
characters correctly on the major browsers.

One way to estimate that percentage might be to compare the http or
meta charset with an encoding detector's result.

> >The commonly used characters are currently being conveyed correctly
> >from human to human by using the common charset names on the wire.
> >If/when you start to introduce charset variant names that are not
> >understood by the clients, even the commonly used characters cannot be
> >viewed, let alone the rare characters supposedly enabled by these
> >variant names.
> >
> >Of course, if we get all the clients to upgrade first, we won't have
> >this problem. But are these minor variants worth all that trouble?
>
> It's definitely a good question. For some applications, the answer
> is clearly 'no'. But for others, it may easily be 'yes'.

Which apps do you have in mind here?

> Please note that the first step towards supporting these variant
> tags would be that recepients check if they support a full variant
> tag, and if not, they look for '--' in the tag, cut off the variant
> part, and try again. That's the main advantage and purpose of a
> special separator. Of course even that requires an update to
> receivers.

Understood.

Erik