[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are charset names supposed to be case sensitive?

To: Leif Halvard Silli <[email protected]>
Subject: Re: Are charset names supposed to be case sensitive?
From: Ira McDonald <[email protected]>
Date: Mon, 19 Dec 2011 21:11:15 -0500
Cc: Bjoern Hoehrmann <[email protected]>, Doug Ewell <[email protected]>,[email protected], Ira McDonald <[email protected]>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com;s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to:cc:content-type; bh=tNkojzWucO2AdMD6hVn3tn/WNI2cWcF7VCQY5uEgxKE=;b=ctc2fWPM9am2pAsS2vPuXXBZpulMjCuZTv8P9l1Knqj/wdYukkCUOkfmopEQjGwtXsNPCK4uR6Zf4A6n5VlBDV2O3ShGz9ALST6F7sj6lR2fBW3Jr10AqQeqcphhRWB71/4fOfhaGaG7WfXHO1qRsdNDPhLjO+y7CCom3iyF1zQ=
In-reply-to: <[email protected]>
List-Id: <ietf-charsets.mail.apps.ietf.org>
List-Owner: <mailto:[email protected]>
List-Subscribe: <mailto:[email protected]?subject=subscribe%20ietf-charsets>
List-Unsubscribe: <mailto:[email protected]?subject=unsubscribe%20ietf-charsets>
Original-recipient: rfc822;[email protected]
References: <E14011F8737B524BB564B05FF748464A5ABE0C51@TK5EX14MBXC133.redmond.corp.microsoft.com><[email protected]><E14011F8737B524BB564B05FF748464A5ABE0DDC@TK5EX14MBXC133.redmond.corp.microsoft.com><[email protected]><[email protected]><[email protected]><[email protected]><FF43329E8B394A34B3DDEAD663111B7F@DougEwell><[email protected]><[email protected]>
Spam-test: False ; 1.2 / 4.5 ; HTML_MESSAGE,SPF_NEUTRAL

Hi,

I appreciate Leif's efforts - but...

I'm very uncomfortable about registering (in whatever case)
'unicode' as a deprecated limited use charset name for some
flavor of UTF-16 (just the BMP?).

This has great potential to add confusion to an already confused
situation, IMHO.

It's audibly (in a screen reader) indistinguishable from the *real*
Unicode - technically aligned w/ ISO 10646 and the name for the
whole enchilada.

It's too bad that users don't know what UTF-8 means and that it
is NOT just another alternative to UTF-16 - but in fact the strongly
preferred alternative when handling message catalogs, streams
of text (marked up or not), and generally IETF and other protocol
elements.

Cheers,
- Ira

Ira McDonald (Musician / Software Architect)
Chair - Linux Foundation Open Printing WG
Secretary - IEEE-ISTO Printer Working Group
Co-Chair - IEEE-ISTO PWG IPP WG
Co-Chair - TCG Trusted Mobility Solutions WG
Chair - TCG Embedded Systems Hardcopy SG
IETF Designated Expert - IPP & Printer MIB
Blue Roof Music/High North Inc
http://sites.google.com/site/blueroofmusic
http://sites.google.com/site/highnorthinc
mailto:[email protected]
Winter� 579 Park Place� Saline, MI� 48176� 734-944-0094
Summer� PO Box 221� Grand Marais, MI 49839� 906-494-2434

On Mon, Dec 19, 2011 at 7:56 PM, Leif Halvard Silli <[email protected]> wrote:

Bjoern Hoehrmann, Mon, 19 Dec 2011 22:31:04 +0100:

> * Doug Ewell wrote:
>> I guess I would like to see some sort of table breaking down the various
>> flavors of UTF-16 and/or UCS-2 that would need to be tagged separately:
>>
>> * big-endian or little-endian by default
>> * accepts BOM
>> * requires BOM
>> * supports all 17 planes or just BMP
>> * etc.
>
> I think it would be helpful to start with separating what the encodings
> are and what the particular behavior of "HTML implementations" is.

Agreed.

> The
> registry is not really meant to cover the encoding detection rules for
> "HTML when served over HTTP" with handling of <meta> elements and such,
> it's more for "you have a label and you have bytes, this is how you get
> characters", where the definition of the label, and not the data format
> tells you how you get the characters.

Well, the registry is supposed say whether the label should be seen as
obsolete, of limited use or 'normal'. These judgements are not simply a
question of 'you have label and you have bytes, this is how you get
characters'. The reasons why products *may* need to have some kind of
support for 'unicode' and 'unicodeFFFE' are the same as why they
probably should be considered 'obsolete' or 'of limited use': They
interfere in a negative way on the stability of 'utf-16', 'utf-16le'
and 'utf-16be'. And these negativities need to be reflected somewhere.
It also, in order to try to get a picture of those issues that I have
focused on what happens if so and so.

Meanwhile, perhaps my new version of the 'unicode' registration looks
better?
--
Leif H Silli

Follow-Ups:
- Re: Are charset names supposed to be case sensitive?
  - From: Bjoern Hoehrmann <[email protected]>

References:
- Are charset names supposed to be case sensitive?
  - From: Shawn Steele <[email protected]>
- Re: Are charset names supposed to be case sensitive?
  - From: Bjoern Hoehrmann <[email protected]>
- RE: Are charset names supposed to be case sensitive?
  - From: Shawn Steele <[email protected]>
- RE: Are charset names supposed to be case sensitive?
  - From: Leif Halvard Silli <[email protected]>
- Re: Are charset names supposed to be case sensitive?
  - From: =?UTF-8?B?Ik1hcnRpbiBKLiBEw7xyc3Qi?= <[email protected]>
- Re: Are charset names supposed to be case sensitive?
  - From: Leif Halvard Silli <[email protected]>
- Re: Are charset names supposed to be case sensitive?
  - From: =?UTF-8?B?Ik1hcnRpbiBKLiBEw7xyc3Qi?= <[email protected]>
- Re: Are charset names supposed to be case sensitive?
  - From: Doug Ewell <[email protected]>
- Re: Are charset names supposed to be case sensitive?
  - From: Bjoern Hoehrmann <[email protected]>
- Re: Are charset names supposed to be case sensitive?
  - From: Leif Halvard Silli <[email protected]>

Prev by Date: Re: Are charset names supposed to be case sensitive?
Next by Date: Re: Are charset names supposed to be case sensitive?
Prev by thread: Re: Are charset names supposed to be case sensitive?
Next by thread: Re: Are charset names supposed to be case sensitive?
Index(es):
- Date
- Thread