[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registering a charset alias



On Wed, 19 Aug 2009 22:35:43 +0200, Shawn Steele <Shawn.Steele@microsoft.com> wrote:
> I'm not sure they're easy to find, I stuck a list of aliases that .Net  
> uses at  
> http://blogs.msdn.com/shawnste/archive/2009/08/18/alternate-encoding-names-recognized-by-net-ie.aspx
>
> http://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings.aspx  
> has a list of the names that .Net calls the various encodings (webname)

Very cool, thanks!

So if I understand this data correctly IE does not treat ISO-8859-1 and Windows-1252 the same? That is not my experience, but maybe I do not understand the code pages concept good enough.


> Note that IE's code page detection is pretty fixed and we're suggesting  
> use of UTF-8 for new content, it's unlikely that any additional aliases  
> would be added or changed in many significant ways.

Understood.


> I think most of our encodings don't lend themselves to the superset  
> concept.  There're probably variations for individual code points even  
> in closely related code pages.  GB18030 might be an exception there.
>
> I'd much rather have the community push for UTF encodings rather than  
> trying to do perfect detection of imperfect code pages.

I agree that we should get everyone to use UTF-8.

This effort is not about new content however, it is about dealing with the vast amount of legacy data around and allowing new clients (and existing) to properly handle the content without having to reverse engineer the market leader.


> Even when names  
> are identical there are still unique quirks of different systems with  
> various code pages.  Sometimes it's just a code point difference, other  
> times it's a bigger problem.

I do think it would help a lot if this was publicly documented.


-- 
Anne van Kesteren
http://annevankesteren.nl/