[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are charset names supposed to be case sensitive?



* Leif Halvard Silli wrote:
>Well, the registry is supposed say whether the label should be seen as 
>obsolete, of limited use or 'normal'. These judgements are not simply a 
>question of 'you have label and you have bytes, this is how you get 
>characters'. The reasons why products *may* need to have some kind of 
>support for 'unicode' and 'unicodeFFFE' are the same as why they 
>probably should be considered 'obsolete' or 'of limited use': They 
>interfere in a negative way on the stability of 'utf-16', 'utf-16le' 
>and 'utf-16be'. And these negativities need to be reflected somewhere. 
>It also, in order to try to get a picture of those issues that I have 
>focused on what happens if so and so.

Reasons for why a label is problematic should be part of the registry,
information on how certain browsers handle a certain name in the <meta>
element in the process of detecting the encoding of a HTML document
should not be. Right now I have trouble telling how to implement the
two encodings you would like to register. What I would do is probably
using my http://search.cpan.org/dist/Win32-MultiLanguage/ module to
convert from the encodings to UTF-8 and look at the results, like if
a "BOM" matters, how surrogates are handled, and so on. With test data
you could then say this is how stuff works independently of HTML. If
there are any issues with that, say things are different from how you
handle UTF-16/LE/BE, that would be useful aswell.

How HTML implementations might treat the labels, or whether somone may
or not want to implement the encoding, and other things like that, are
secondary and should be looked at when the definition of the encoding,
or perhaps the difficulties in defining the label, are clear.

>Meanwhile, perhaps my new version of the 'unicode' registration looks 
>better?

You lost me at

      The 'unicode' spec defines 'utf-16' as its alias, but this of
      course contradicts with 'utf-16' as defined in the IANA registry.

already. I can't tell for instance whether this would be still true if
the label would be registered as you propose.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/