[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are charset names supposed to be case sensitive?

To: Doug Ewell <doug@ewellic.org>
Subject: Re: Are charset names supposed to be case sensitive?
From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Tue, 20 Dec 2011 04:46:35 +0100
Cc: ietf-charsets@iana.org
In-reply-to: <FF43329E8B394A34B3DDEAD663111B7F@DougEwell>
List-Id: <ietf-charsets.mail.apps.ietf.org>
List-Owner: <mailto:ietf-charsets-owner@mail.apps.ietf.org>
List-Subscribe: <mailto:mailserv@mail.apps.ietf.org?subject=subscribe%20ietf-charsets>
List-Unsubscribe: <mailto:mailserv@mail.apps.ietf.org?subject=unsubscribe%20ietf-charsets>
Organization: =?utf-8?B?TcOlbGZvcm0ubm8=?=
Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
References: <E14011F8737B524BB564B05FF748464A5ABE0C51@TK5EX14MBXC133.redmond.corp.microsoft.com><vmgke71gao6n6110kq8nkrreuda8e10h38@hive.bjoern.hoehrmann.de><E14011F8737B524BB564B05FF748464A5ABE0DDC@TK5EX14MBXC133.redmond.corp.microsoft.com><20111217090111578908.7667afe0@xn--mlform-iua.no><4EEC948B.2070106@it.aoyama.ac.jp><20111217230237476701.4b777d7b@xn--mlform-iua.no><4EEDA564.9040909@it.aoyama.ac.jp> <FF43329E8B394A34B3DDEAD663111B7F@DougEwell>
Spam-test: False ; 0.0 / 4.5

Doug Ewell, Mon, 19 Dec 2011 08:36:10 -0700:

> It seems Leif might be trying to tag the incomplete or erroneous 
> behavior of individual applications, even if they don't correspond to 
> documented behavior, or to tag mis-documented behavior that may not 
> actually be implemented (like "unicode" meaning "BMP only").

* BMP: The motivation behind why the registrations says 'BMP' was only 
that the written spec says so and because the registration template 
asked for such data.

* Products: Reference to products are made in order to document that 
the 'unicode'/'unicodeFFFE' specs actually are implemented. In that 
regard, the possible 'BMP'-incorrectness seems far less important 
w.r.t. practical 'real' problems than the endianness issues.

* Actually implemented: That 'unicode' and 'utf-16' (in the Microsoft 
spec) are names for little-endian UTF-16, while 'unicodeFFFE' is name 
for big-endian UTF-16, is a fact. To verify, try the following web page 
in Chrome, Safari or IE - the clue being that the page is 
'utf-16b'-encoded while HTTP says 'utf-16': 
http://malform.no/testing/utf/html/16be/http.utf16
   For reference, an identical, but little-endian encoded page:
http://malform.no/testing/utf/html/16le/http.utf16
   If IE and Safari/Chrome implemented the official UTF-16 
specification, the first page should have worked fine, while the latter 
perhaps did not need to work. Instead, we see the opposite: The first 
page fails in in the mentioned browsers.

* 'Actually implemented' has reached Web standards: HTML5 specifies: 
«The requirement to default UTF-16 to little-endian rather than 
big-endian is a willful violation of RFC 2781, motivated by a desire 
for compatibility with legacy content. [RFC2781]» 
<http://dev.w3.org/html5/spec/parsing.html#character-encodings-0> 
Whether it is 'legacy content' - as HTML5  claims - or implementation 
of the Microsoft spec - or both things - that makes HTML5 say this, is 
perhaps an open question.

> I'm not sure that's a goal of registering charsets. 

The goals with these registrations are to comply with section 2.5. In 
particular did this seem relevant: «the use of a large number of 
undocumented and/or unlabeled charsets hampers interoperability even 
more.»
<http://tools.ietf.org/html/bcp19#section-2.5>

> It also seemed to 
> me—though I assume I'm wrong here—that he was trying to call 
> particular attention to errors in Microsoft implementations, but I'm 
> sure Shawn and others can speak to that.

It is not only products of Microsoft: Webkit is backed by Apple, 
Google, HTML5 ...

But with Microsoft's positive attitude Unicode, including UTF-16, it 
seems reasonable to ask: Is it certain that Microsoft - and the 
community at large - is aware of how they operate with a shadow spec 
that contradicts UTF-16 - and the impacts of this? Perhaps, with a 
little attention to this, they will update or fine-tune? Here is 
hoping. 
-- 
Leif Halvard Silli

References:
- Are charset names supposed to be case sensitive?
  - From: Shawn Steele <Shawn.Steele@microsoft.com>
- Re: Are charset names supposed to be case sensitive?
  - From: Bjoern Hoehrmann <derhoermi@gmx.net>
- RE: Are charset names supposed to be case sensitive?
  - From: Shawn Steele <Shawn.Steele@microsoft.com>
- RE: Are charset names supposed to be case sensitive?
  - From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Re: Are charset names supposed to be case sensitive?
  - From: =?UTF-8?B?Ik1hcnRpbiBKLiBEw7xyc3Qi?= <duerst@it.aoyama.ac.jp>
- Re: Are charset names supposed to be case sensitive?
  - From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Re: Are charset names supposed to be case sensitive?
  - From: =?UTF-8?B?Ik1hcnRpbiBKLiBEw7xyc3Qi?= <duerst@it.aoyama.ac.jp>
- Re: Are charset names supposed to be case sensitive?
  - From: Doug Ewell <doug@ewellic.org>

Prev by Date: Re: Are charset names supposed to be case sensitive?
Next by Date: Re: Are charset names supposed to be case sensitive?
Prev by thread: Re: Are charset names supposed to be case sensitive?
Next by thread: RE: Are charset names supposed to be case sensitive?
Index(es):
- Date
- Thread