[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Registering a charset alias




> So if I understand this data correctly IE does not treat ISO-8859-1
> and Windows-1252 the same? That is not my experience, but maybe I do
> not understand the code pages concept good enough.

I'm not an IE dude, I'm the code page guy :).  IE does code page autodetection in some cases.  When and how depends on the IE version.  Newer content is more likely to be correctly tagged, and I think newer versions of IE are more likely to trust the declarations.  (I'm not an IE guy, I could be completely wrong).

> This effort is not about new content however, it is about dealing
> with the vast amount of legacy data around and allowing new clients
> (and existing) to properly handle the content without having to reverse
> engineer the market leader.

IE uses MLang for code page detection and aliases.  Competitors could do the same, but handling of untagged or incorrectly tagged content is probably problematic.

> > Even when names
> > are identical there are still unique quirks of different systems with
> > various code pages.  Sometimes it's just a code point difference, other
> > times it's a bigger problem.

> I do think it would help a lot if this was publicly documented.

I didn't mean IE, I meant the code page itself.  It's really, really, really, really hard to figure out if a glyph on one machine ends up looking like the same glyph on a different OS.

A huge problem with the existing web content is that it isn't perfectly tagged.  Specifically some earlier servers served "ISO" code pages since they were Unix boxes, but often content was written on a Windows box in a "Windows" code page and stuck on the server.  Since the same windows box was used to view it, it seemed to work.  Not only IE, but Netscape and others tried to make sense out of the mess.

Even with "correct" tagging, I see content merged onto a page that was obviously entered somewhere else and has messed up rich quotes or other data, so even now code page confusion is happening.

If some browser can't render a page tagged as ISO-8859-1 that looks OK in some version of IE, then I would think the problem isn't that IE treats ISO-8859-1 data as Windows-1252, since that would be gibberish as well.  The problem is that the page must've been mistagged as ISO-8859-1 when it really wasn't :(

-Shawn