[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding Standard (mostly complete)



On Thu, 19 Apr 2012 20:00:53 +0200, Shawn Steele  
<Shawn.Steele@microsoft.com> wrote:
>> Entries for euc-kr, gb_2312-80, ... are similarly not helpful. euc-kr  
>> does
>> not mention you need to support Unified Hangul Code as Internet Explorer
>> does in order to work with Korean content and gb_2312-80 does not  
>> mention
>> you should really use your gbk decoder/encoder instead.
>
> Use Unicode.

You keep saying in this context of me trying to explain why a browser  
handling *legacy* pages has a hard time knowing what to implement. It is  
starting to get annoying. If those pages used Unicode we would not  
continue to get bug reports.


>> No they are not well understood. I do not know about Internet Explorer,
>> but browsers other than Internet Explorer continue to hit compatibility
>> issues in this part of their code and continue to make changes because  
>> of it, without clear guidance thus far as what the end goal ought to be  
>> and
>> what everyone else is aiming for.
>
> Use Unicode.  Even if you figure out exactly what every browser is  
> doing, you still have no idea what browser/version the page was  
> targeting.  Even if you created a perfect version of the ABC encoding  
> (placeholder for your favorite encoding), and convinced all of the  
> browsers to adopt the perfect ABC encoding, you'll continue to have  
> encoding problems because there are millions of pages implemented with  
> the existing variations of ABC encoding.

Yes, that is why we perform content analysis to figure out what the best  
way to decode data would be. See e.g.  
http://lists.w3.org/Archives/Public/public-html-ig-zh/2012Apr/


> If you want to convince them to update their pages to the "correct" ABC,  
> then it'd be far better to get them to move to UTF-8.  For that matter,  
> getting them to correctly tag their existing data would solve most of  
> the most egregious problems.

The assumption is that neither of those is going to happen for data we  
still want to read in say a hundred years time.


> IMO I would MUCH rather see this much effort put into encouraging  
> Unicode, than to pin down the existing rats nest and accidentally  
> encouraging people to continue with the bad practice of using encodings.

This effort is not aimed at content authors.

Speaking of which, I've been a tireless advocate of utf-8 since before I  
knew how it worked. I wrote e.g.

http://annevankesteren.nl/2004/06/utf-8
http://annevankesteren.nl/2009/09/utf-8-reasons

And last night while you wrote your email I presented on the topic at a  
local developer meetup:

http://annevankesteren.nl/presentations/1F4A9.html

This is not about that. This is about handling existing *legacy* content  
that is unlikely to change.


-- 
Anne van Kesteren
http://annevankesteren.nl/