[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding Standard (mostly complete)

To: ietf-charsets <ietf-charsets@iana.org>, Doug Ewell <doug@ewellic.org>,Shawn Steele <Shawn.Steele@microsoft.com>
Subject: Re: Encoding Standard (mostly complete)
From: Anne van Kesteren <annevk@opera.com>
Date: Fri, 20 Apr 2012 08:47:54 +0200
In-reply-to: <E14011F8737B524BB564B05FF748464A618A25F1@TK5EX14MBXC139.redmond.corp.microsoft.com>
List-Id: <ietf-charsets.mail.apps.ietf.org>
List-Owner: <mailto:ietf-charsets-owner@mail.apps.ietf.org>
List-Subscribe: <mailto:mailserv@mail.apps.ietf.org?subject=subscribe%20ietf-charsets>
List-Unsubscribe: <mailto:mailserv@mail.apps.ietf.org?subject=unsubscribe%20ietf-charsets>
Organization: Opera Software
Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
References: <20120417112034.665a7a7059d7ee80bb4d670165c8327d.c18ae3d34d.wbe@email03.secureserver.net><op.wcx7ipih64w2qv@annevk-macbookpro.local><E14011F8737B524BB564B05FF748464A6189E908@TK5EX14MBXC139.redmond.corp.microsoft.com><op.wcz5wpsh64w2qv@annevk-macbookpro.local><E14011F8737B524BB564B05FF748464A618A25F1@TK5EX14MBXC139.redmond.corp.microsoft.com>
Spam-test: False ; 1.0 / 4.5 ; SPF_SOFTFAIL
User-Agent: Opera Mail/11.62 (MacIntel)

On Thu, 19 Apr 2012 20:00:53 +0200, Shawn Steele  
<Shawn.Steele@microsoft.com> wrote:
>> Entries for euc-kr, gb_2312-80, ... are similarly not helpful. euc-kr  
>> does
>> not mention you need to support Unified Hangul Code as Internet Explorer
>> does in order to work with Korean content and gb_2312-80 does not  
>> mention
>> you should really use your gbk decoder/encoder instead.
>
> Use Unicode.

You keep saying in this context of me trying to explain why a browser  
handling *legacy* pages has a hard time knowing what to implement. It is  
starting to get annoying. If those pages used Unicode we would not  
continue to get bug reports.

>> No they are not well understood. I do not know about Internet Explorer,
>> but browsers other than Internet Explorer continue to hit compatibility
>> issues in this part of their code and continue to make changes because  
>> of it, without clear guidance thus far as what the end goal ought to be  
>> and
>> what everyone else is aiming for.
>
> Use Unicode.  Even if you figure out exactly what every browser is  
> doing, you still have no idea what browser/version the page was  
> targeting.  Even if you created a perfect version of the ABC encoding  
> (placeholder for your favorite encoding), and convinced all of the  
> browsers to adopt the perfect ABC encoding, you'll continue to have  
> encoding problems because there are millions of pages implemented with  
> the existing variations of ABC encoding.

Yes, that is why we perform content analysis to figure out what the best  
way to decode data would be. See e.g.  
http://lists.w3.org/Archives/Public/public-html-ig-zh/2012Apr/

> If you want to convince them to update their pages to the "correct" ABC,  
> then it'd be far better to get them to move to UTF-8.  For that matter,  
> getting them to correctly tag their existing data would solve most of  
> the most egregious problems.

The assumption is that neither of those is going to happen for data we  
still want to read in say a hundred years time.

> IMO I would MUCH rather see this much effort put into encouraging  
> Unicode, than to pin down the existing rats nest and accidentally  
> encouraging people to continue with the bad practice of using encodings.

This effort is not aimed at content authors.

Speaking of which, I've been a tireless advocate of utf-8 since before I  
knew how it worked. I wrote e.g.

http://annevankesteren.nl/2004/06/utf-8
http://annevankesteren.nl/2009/09/utf-8-reasons

And last night while you wrote your email I presented on the topic at a  
local developer meetup:

http://annevankesteren.nl/presentations/1F4A9.html

This is not about that. This is about handling existing *legacy* content  
that is unlikely to change.

-- 
Anne van Kesteren
http://annevankesteren.nl/

Follow-Ups:
- RE: Encoding Standard (mostly complete)
  - From: Shawn Steele <Shawn.Steele@microsoft.com>

References:
- RE: Encoding Standard (mostly complete)
  - From: Doug Ewell <doug@ewellic.org>
- Re: Encoding Standard (mostly complete)
  - From: Anne van Kesteren <annevk@opera.com>
- RE: Encoding Standard (mostly complete)
  - From: Shawn Steele <Shawn.Steele@microsoft.com>
- Re: Encoding Standard (mostly complete)
  - From: Anne van Kesteren <annevk@opera.com>
- RE: Encoding Standard (mostly complete)
  - From: Shawn Steele <Shawn.Steele@microsoft.com>

Prev by Date: Re: Encoding Standard (mostly complete)
Next by Date: RE: Encoding Standard (mostly complete)
Prev by thread: RE: Encoding Standard (mostly complete)
Next by thread: RE: Encoding Standard (mostly complete)
Index(es):
- Date
- Thread