[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Are charset names supposed to be case sensitive?



Hi Shawn and all,

Magic vs semantics: I don't attach magic to the casing. But I do 
recognize that there nevertheless is semantics attached to the casing - 
by others than myself. Thus I've tried to be consistent with the casing 
found in the IANA registry (UPPERCASE) and in the Microsoft listing 
(lowercase and mixedCASE). This consistency have the following benefits:

* It makes it easier to separate the semantics of the utf-16 alias
  in the Microsoft listing from the semantics of the UTF-16 name in
  the IANA registry.
* It separates 'unicode' from the trademarked/registered 'UNICODE'.
* The casing 'unicodeFFFE' is more readable than 'unicodefffe' or
  'UNICODEFFFE'.

BTW: Here are 3 new, preliminary findings from the test suite I work on:

* Microsoft's products add the BOM *and* the <meta> charset 
declaration. However, my new tests show that <meta> charset 
declarations inside UTF-16 flavor files actually is IE-*incompatible*: 
For a file without BOM or HTTP charset info, then the <meta> charset 
declaration, regardless of its value, causes IE to not sniff the 
encoding - if one deletes the <meta> charest, however, *then* it sniffs 
it. This fact serves to underlies that 'unicode' and 'unicodeFFF' 
require the BOM, as it would actually be unsafe to say charset=unicode 
unless there is a BOM.

* HORROR: IE is not alone in treating 'UTF-16/utf-16' as an alias for 
'unicode': Webkit (Safari, Chrome) behave the same way. Thus, if HTTP 
announces 'UTF-16' for a file without the BOM, then instead of starting 
to sniff, Webkit - just like IE -  defaults to LE, resulting in 
mojibake in both IE and Webkit. 

* Contrary to my HTML5 process based impression, IE has zero problems 
with guessing the encoding and the endianness of a BOMless UTF-16 file 
that doesn't get any encoding info from the HTTP Content-Type or a 
<meta> element. Safari/Chrome, however, *they* default incorrectlly in 
such cases.

Leif H Silli


Shawn Steele, Thu, 15 Dec 2011 19:00:06 +0000:
> That's what I thought, it was unclear to me if lief's proposal was 
> making a distinction between utf16 and UTF16 :)
> 
> -----Original Message-----
> From: Bjoern Hoehrmann [mailto:derhoermi@gmx.net] 
> Sent: Thursday, December 15, 2011 10:59 AM
> To: Shawn Steele
> Cc: ietf-charsets@iana.org
> Subject: Re: Are charset names supposed to be case sensitive?
> 
> * Shawn Steele wrote:
>> Are charset names supposed to be case sensitive?
> 
> No, RFC 2978 implies they are case-insensitive and so they are pretty 
> much everywhere.
> --
> Björn Höhrmann · mailto:bjoern@hoehrmann.de · 
> http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: 
> +49(0)160/4415681 · http://www.bjoernsworld.de
> 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
> 
>