[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Are charset names supposed to be case sensitive?
On 2011/12/17 17:01, Leif Halvard Silli wrote:
> Hi Shawn and all,
>
> Magic vs semantics: I don't attach magic to the casing. But I do
> recognize that there nevertheless is semantics attached to the casing -
> by others than myself.
Who exactly? Does any product (Microsoft or otherwise) produce different
results when they see different case in the same place?
If yes, which product, and what are the differences (and what happens
with mixed case cases)?
If not, I would strongly suggest not to use case differences to refer to
different usages of the same label, because this may cause a lot of
confusion. (It already had Shawn confused, and me, too.)
> Thus I've tried to be consistent with the casing
> found in the IANA registry (UPPERCASE) and in the Microsoft listing
> (lowercase and mixedCASE).
The fact that the IANA registry lists the charset labels with uppercase
characters isn't more than a random convention, and this may also be so
for the Microsoft listings. Please use a single case version unless case
is really significant in the sense that one and the same product, in one
and the same protocol slot, reacts different to different case forms.
> This consistency have the following benefits:
>
> * It makes it easier to separate the semantics of the utf-16 alias
> in the Microsoft listing from the semantics of the UTF-16 name in
> the IANA registry.
For all intents and purposes, these are one and the same charset label.
If you want to distinguish them, please do so with additional words, not
with case.
> * It separates 'unicode' from the trademarked/registered 'UNICODE'.
> * The casing 'unicodeFFFE' is more readable than 'unicodefffe' or
> 'UNICODEFFFE'.
For these two, you have only used a single casing, so there's no
confusion. So these should be fine.
Regards, Martin.
> BTW: Here are 3 new, preliminary findings from the test suite I work on:
>
> * Microsoft's products add the BOM *and* the<meta> charset
> declaration. However, my new tests show that<meta> charset
> declarations inside UTF-16 flavor files actually is IE-*incompatible*:
> For a file without BOM or HTTP charset info, then the<meta> charset
> declaration, regardless of its value, causes IE to not sniff the
> encoding - if one deletes the<meta> charest, however, *then* it sniffs
> it. This fact serves to underlies that 'unicode' and 'unicodeFFF'
> require the BOM, as it would actually be unsafe to say charset=unicode
> unless there is a BOM.
>
> * HORROR: IE is not alone in treating 'UTF-16/utf-16' as an alias for
> 'unicode': Webkit (Safari, Chrome) behave the same way. Thus, if HTTP
> announces 'UTF-16' for a file without the BOM, then instead of starting
> to sniff, Webkit - just like IE - defaults to LE, resulting in
> mojibake in both IE and Webkit.
>
> * Contrary to my HTML5 process based impression, IE has zero problems
> with guessing the encoding and the endianness of a BOMless UTF-16 file
> that doesn't get any encoding info from the HTTP Content-Type or a
> <meta> element. Safari/Chrome, however, *they* default incorrectlly in
> such cases.
>
> Leif H Silli
>
>
> Shawn Steele, Thu, 15 Dec 2011 19:00:06 +0000:
>> That's what I thought, it was unclear to me if lief's proposal was
>> making a distinction between utf16 and UTF16 :)
>>
>> -----Original Message-----
>> From: Bjoern Hoehrmann [mailto:derhoermi@gmx.net]
>> Sent: Thursday, December 15, 2011 10:59 AM
>> To: Shawn Steele
>> Cc: ietf-charsets@iana.org
>> Subject: Re: Are charset names supposed to be case sensitive?
>>
>> * Shawn Steele wrote:
>>> Are charset names supposed to be case sensitive?
>>
>> No, RFC 2978 implies they are case-insensitive and so they are pretty
>> much everywhere.
>> --
>> Björn Höhrmann · mailto:bjoern@hoehrmann.de ·
>> http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon:
>> +49(0)160/4415681 · http://www.bjoernsworld.de
>> 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
>>