[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Registration of some code pages



I don't think it can be restricted to the web.  For various reasons this difference could happen in other places too (MIME).

And "Big5" doesn't always mean "windows 950".  It often does on a Windows box, but it might actually mean "Big5" on other machines.  (I was going to provide examples, but there's a lot of variety across vendors, wikipedia lists a lot of variations).

HTML5 seems to be providing "clear guidance" to web developers, so if we had this kind of annotation, then HTML5 could point to that.  However, even in HTML (especially in HTML?), big5 doesn't necessarily always mean Windows 950, it could actually be a different varient from whatever authoring system originated it.

Seems to me that, in practice, these labels narrow down the behavior, but there are variations.  Sometimes only a couple codepoints, sometimes bigger, but a "wise" application would allow for imperfect code page identification.  (Like the user drop-down to change code pages).

Ironically, IE9 beta's are getting stricter about the page declarations, and I've started seeing the opposite (Arabic sites tagged as Arabic but they're actually UTF-8, etc.), which is, I guess, why people have been doing autodetection.

Anyway, I'd like to annotate a couple records as "this is sometimes called xxx" without making it an alias, especially when the "alias" is already defined as something else.

-Shawn

 
http://blogs.msdn.com/shawnste


________________________________________
From: Anne van Kesteren [annevk@opera.com]
Sent: Friday, September 03, 2010 3:18 AM
To: 'ietf-charsets@iana.org'; Shawn Steele
Subject: Re: Registration of some code pages

On Thu, 02 Sep 2010 01:48:21 +0200, Shawn Steele
<Shawn.Steele@microsoft.com> wrote:
> I’ve been asked to register a few code pages, but some of them are
> problematic because the names don’t match our behavior exactly.  This is
> similar to the problem facing HTML5 when they are trying to map from the
> names used to the actual behavior that IE (and therefore others) use.
>
> Specifically, I’m wondering about registering “windows 950”, but somehow
> annotating it that Microsoft typically redirects “big5” to that
> behavior.  So an alias isn’t really appropriate.
>
> Similarly, is there something we could add to the Windows-31J
> registration to recognize that Microsoft uses shift_jis to point to
> Windows-31J instead of using the registered form?
>
> IMO there’s not much point in me registering “new” names to help people
> understand existing compatibility issues since the new names won’t be
> recognized by the existing implementations.

Ideally the registry (or maybe a new one specific to the Web) reflects
what implementations do. If we all agree that "big5" means "windows 950"
than "big5" should just mean that even though originally the intention
might have been different. After all, that is what the running code is
doing. That would give the most clear guidance to new implementors and
allows existing implementors to converge.


--
Anne van Kesteren
http://annevankesteren.nl/