[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Encoding Standard (mostly complete)



> My experience is that by defining a feature in detail and writing a test  
> suite implementations will converge over time.

Our implementation of encodings WILL NOT change.  Ever.  I also don't have the resources to validate that your standard matches our behavior.

I'm not at all trying to say that our implementations are "perfect".  (On the contrary, I've blogged quite often about how there are lots of variations in the wild.  We even have slight differences in our various SDKs.)  However, there are millions of our customers that depend on our current behavior.  If that behavior changes even slightly, then that will "corrupt" their data.

Our handling of shift_jis is probably the most severe of those, where the standard has evolved beyond what we support, but we can't change without breaking people.  

> Entries for euc-kr, gb_2312-80, ... are similarly not helpful. euc-kr does  
> not mention you need to support Unified Hangul Code as Internet Explorer  
> does in order to work with Korean content and gb_2312-80 does not mention  
> you should really use your gbk decoder/encoder instead.

Use Unicode.  

> No they are not well understood. I do not know about Internet Explorer,  
> but browsers other than Internet Explorer continue to hit compatibility  
> issues in this part of their code and continue to make changes because of  
> it, without clear guidance thus far as what the end goal ought to be and  
> what everyone else is aiming for.

Use Unicode.  Even if you figure out exactly what every browser is doing, you still have no idea what browser/version the page was targeting.  Even if you created a perfect version of the ABC encoding (placeholder for your favorite encoding), and convinced all of the browsers to adopt the perfect ABC encoding, you'll continue to have encoding problems because there are millions of pages implemented with the existing variations of ABC encoding.  If you want to convince them to update their pages to the "correct" ABC, then it'd be far better to get them to move to UTF-8.  For that matter, getting them to correctly tag their existing data would solve most of the most egregious problems.

IMO I would MUCH rather see this much effort put into encouraging Unicode, than to pin down the existing rats nest and accidentally encouraging people to continue with the bad practice of using encodings.

-Shawn