[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are charset names supposed to be case sensitive?



Hi,

I appreciate Leif's efforts - but...

I'm very uncomfortable about registering (in whatever case)
'unicode' as a deprecated limited use charset name for some
flavor of UTF-16 (just the BMP?).

This has great potential to add confusion to an already confused
situation, IMHO.

It's audibly (in a screen reader) indistinguishable from the *real*
Unicode - technically aligned w/ ISO 10646 and the name for the
whole enchilada.

It's too bad that users don't know what UTF-8 means and that it
is NOT just another alternative to UTF-16 - but in fact the strongly
preferred alternative when handling message catalogs, streams
of text (marked up or not), and generally IETF and other protocol
elements.

Cheers,
- Ira


Ira McDonald (Musician / Software Architect)
Chair - Linux Foundation Open Printing WG
Secretary - IEEE-ISTO Printer Working Group
Co-Chair - IEEE-ISTO PWG IPP WG
Co-Chair - TCG Trusted Mobility Solutions WG
Chair - TCG Embedded Systems Hardcopy SG
IETF Designated Expert - IPP & Printer MIB
Blue Roof Music/High North Inc
http://sites.google.com/site/blueroofmusic
http://sites.google.com/site/highnorthinc
mailto:blueroofmusic@gmail.com
Winter  579 Park Place  Saline, MI  48176  734-944-0094
Summer  PO Box 221  Grand Marais, MI 49839  906-494-2434



On Mon, Dec 19, 2011 at 7:56 PM, Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> wrote:
Bjoern Hoehrmann, Mon, 19 Dec 2011 22:31:04 +0100:
> * Doug Ewell wrote:
>> I guess I would like to see some sort of table breaking down the various
>> flavors of UTF-16 and/or UCS-2 that would need to be tagged separately:
>>
>> * big-endian or little-endian by default
>> * accepts BOM
>> * requires BOM
>> * supports all 17 planes or just BMP
>> * etc.
>
> I think it would be helpful to start with separating what the encodings
> are and what the particular behavior of "HTML implementations" is.

Agreed.

> The
> registry is not really meant to cover the encoding detection rules for
> "HTML when served over HTTP" with handling of <meta> elements and such,
> it's more for "you have a label and you have bytes, this is how you get
> characters", where the definition of the label, and not the data format
> tells you how you get the characters.

Well, the registry is supposed say whether the label should be seen as
obsolete, of limited use or 'normal'. These judgements are not simply a
question of 'you have label and you have bytes, this is how you get
characters'. The reasons why products *may* need to have some kind of
support for 'unicode' and 'unicodeFFFE' are the same as why they
probably should be considered 'obsolete' or 'of limited use': They
interfere in a negative way on the stability of 'utf-16', 'utf-16le'
and 'utf-16be'. And these negativities need to be reflected somewhere.
It also, in order to try to get a picture of those issues that I have
focused on what happens if so and so.

Meanwhile, perhaps my new version of the 'unicode' registration looks
better?
--
Leif H Silli