[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Encoding Standard (mostly complete)
On Thu, 19 Apr 2012 20:00:53 +0200, Shawn Steele
<Shawn.Steele@microsoft.com> wrote:
>> Entries for euc-kr, gb_2312-80, ... are similarly not helpful. euc-kr
>> does
>> not mention you need to support Unified Hangul Code as Internet Explorer
>> does in order to work with Korean content and gb_2312-80 does not
>> mention
>> you should really use your gbk decoder/encoder instead.
>
> Use Unicode.
You keep saying in this context of me trying to explain why a browser
handling *legacy* pages has a hard time knowing what to implement. It is
starting to get annoying. If those pages used Unicode we would not
continue to get bug reports.
>> No they are not well understood. I do not know about Internet Explorer,
>> but browsers other than Internet Explorer continue to hit compatibility
>> issues in this part of their code and continue to make changes because
>> of it, without clear guidance thus far as what the end goal ought to be
>> and
>> what everyone else is aiming for.
>
> Use Unicode. Even if you figure out exactly what every browser is
> doing, you still have no idea what browser/version the page was
> targeting. Even if you created a perfect version of the ABC encoding
> (placeholder for your favorite encoding), and convinced all of the
> browsers to adopt the perfect ABC encoding, you'll continue to have
> encoding problems because there are millions of pages implemented with
> the existing variations of ABC encoding.
Yes, that is why we perform content analysis to figure out what the best
way to decode data would be. See e.g.
http://lists.w3.org/Archives/Public/public-html-ig-zh/2012Apr/
> If you want to convince them to update their pages to the "correct" ABC,
> then it'd be far better to get them to move to UTF-8. For that matter,
> getting them to correctly tag their existing data would solve most of
> the most egregious problems.
The assumption is that neither of those is going to happen for data we
still want to read in say a hundred years time.
> IMO I would MUCH rather see this much effort put into encouraging
> Unicode, than to pin down the existing rats nest and accidentally
> encouraging people to continue with the bad practice of using encodings.
This effort is not aimed at content authors.
Speaking of which, I've been a tireless advocate of utf-8 since before I
knew how it worked. I wrote e.g.
http://annevankesteren.nl/2004/06/utf-8
http://annevankesteren.nl/2009/09/utf-8-reasons
And last night while you wrote your email I presented on the topic at a
local developer meetup:
http://annevankesteren.nl/presentations/1F4A9.html
This is not about that. This is about handling existing *legacy* content
that is unlikely to change.
--
Anne van Kesteren
http://annevankesteren.nl/