[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Indicating charset variants (was: RE: windows 936)



At 22:38 07/09/22, Erik van der Poel wrote:
>On 9/22/07, Martin Duerst <duerst@it.aoyama.ac.jp> wrote:

>> Even if some percentage of these is wrong (do you have any idea?),
>> that's definitely a lot of progress.
>
>I believe there are shades of gray between "wrong" and right, partly
>because of the variant issue and partly because the major browsers are
>not 100% consistent with each other, with the IANA registry or with
>the official and semi-official Unicode mapping tables.
>
>I suspect that a large percentage of the meta and http charsets is
>"correct", at least to the extent of displaying the more common
>characters correctly on the major browsers.

That's all we can ask for at the moment, I guess. If we don't
provide a way to label variants, there is no way to expect that
pages are labeled with the correct variant.

>One way to estimate that percentage might be to compare the http or
>meta charset with an encoding detector's result.

Yes. Do you have any way to do that (of course that would be done
just on a careful sample)?

>> >The commonly used characters are currently being conveyed correctly
>> >from human to human by using the common charset names on the wire.
>> >If/when you start to introduce charset variant names that are not
>> >understood by the clients, even the commonly used characters cannot be
>> >viewed, let alone the rare characters supposedly enabled by these
>> >variant names.
>> >
>> >Of course, if we get all the clients to upgrade first, we won't have
>> >this problem. But are these minor variants worth all that trouble?
>>
>> It's definitely a good question. For some applications, the answer
>> is clearly 'no'. But for others, it may easily be 'yes'.
>
>Which apps do you have in mind here?

A very typical case would be XML Signature. According to the spec,
you can sign e.g. an XML document in Shift_JIS, but it's done
by conversion to UTF-8. If the conversion isn't the same when
the signature is checked, the signature won't match anymore even
if it's actually correct.

Cases like these are quite different from the usual browsing case,
where an odd wrong character may just be overlooked.

Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp