[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are charset names supposed to be case sensitive?



Bjoern Hoehrmann, Mon, 19 Dec 2011 22:31:04 +0100:
> * Doug Ewell wrote:
>> I guess I would like to see some sort of table breaking down the various 
>> flavors of UTF-16 and/or UCS-2 that would need to be tagged separately:
>> 
>> * big-endian or little-endian by default
>> * accepts BOM
>> * requires BOM
>> * supports all 17 planes or just BMP
>> * etc.
> 
> I think it would be helpful to start with separating what the encodings
> are and what the particular behavior of "HTML implementations" is.

Agreed.

> The
> registry is not really meant to cover the encoding detection rules for
> "HTML when served over HTTP" with handling of <meta> elements and such,
> it's more for "you have a label and you have bytes, this is how you get
> characters", where the definition of the label, and not the data format
> tells you how you get the characters.

Well, the registry is supposed say whether the label should be seen as 
obsolete, of limited use or 'normal'. These judgements are not simply a 
question of 'you have label and you have bytes, this is how you get 
characters'. The reasons why products *may* need to have some kind of 
support for 'unicode' and 'unicodeFFFE' are the same as why they 
probably should be considered 'obsolete' or 'of limited use': They 
interfere in a negative way on the stability of 'utf-16', 'utf-16le' 
and 'utf-16be'. And these negativities need to be reflected somewhere. 
It also, in order to try to get a picture of those issues that I have 
focused on what happens if so and so.

Meanwhile, perhaps my new version of the 'unicode' registration looks 
better?
-- 
Leif H Silli