[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Encoding Standard (mostly complete)



> You keep saying in this context of me trying to explain why a browser handling *legacy*
> pages has a hard time knowing what to implement. It is starting to get annoying. 
> If those pages used Unicode we would not continue to get bug reports.

Exactly.  Opera isn't the only one that gets those bug reports.  I get bug reports too.

User A visits web site W that was written to match JIS's latest and greatest EXACTLY.  That fails on IE because we don't know everything about JIS.
User B visits web site X that was written based on an older Linux interpretation.  That fails on IE because the Linux version differed from ours.
User C visits web site Y that wasn't tagged perfectly with an exact variation, but it succeeds on IE because it happens to be our behavior.
User D visits web site Z that is tagged perfectly, but it fails in IE because we don't recognize the name.  The developer didn't bother with IE so didn't realize the problem.

These are all mutually exclusive.  If I "fix" user A, then I risk breaking the rest, etc.  If I break user C & web site Y, they get really mad because they used to work, even though it wasn't "right."  The only way to resolve this is to update all (or at least many) of the documents (which will never happen).  The only Encoding which is (reasonably) consistent across Windows/.Net/Linux/Unix/OS X/IE/notepad/Office/Firefox/Chrome/Opera/etc. is Unicode.  (Even then, there are still oddities with the PUA and stuff, but at least it constrains the problem).

We support operating systems for many years, and data that originated much older than that.  I have a billion machines in use, and a lot of those aren't going to upgrade until they die.  I have no idea how to figure out how many documents there are.  If I touch a code page that breaks those documents and those machines, then I get lots of colorful feedback to broaden my horizons.

I've tried to help update the IETF lists to point to our current behaviors in the interest of compatibility, and I think all of the code page data is published.

> The assumption is that neither of those is going to happen for data we still want to read in say a hundred years time.
> This is not about that. This is about handling existing *legacy* content that is unlikely to change.

But if that content isn't tagged with variation 123 of ABC standard, how can you read it?  Like all those files generated by notepad?  The author undoubtedly was able to read the document on her machine with her configuration.  It's when that file crosses machines that things most often get confused, and we don't have the precision to distinguish between them.

IMO trying to guess variations heuristically makes it worse if you want cross everything compatibility because everyone's heuristics differ.  (Indeed we had lots of push-back about IE's old code page autodetection/autocorrection behavior).

-Shawn