[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Are charset names supposed to be case sensitive?



Hello Leif,

On 2011/12/18 7:02, Leif Halvard Silli wrote:
> "Martin J. Dürst", Sat, 17 Dec 2011 22:09:31 +0900:
>> On 2011/12/17 17:01, Leif Halvard Silli wrote:
>
>>> Magic vs semantics: I don't attach magic to the casing. But I do
>>> recognize that there nevertheless is semantics attached to the casing -
>>> by others than myself.
>    ...
>> If not,
>
> We have a 'if not' situation.

Thanks for the confirmation.

>> I would strongly suggest not to use case differences to refer
>> to different usages of the same label, because this may cause a lot
>> of confusion. (It already had Shawn confused, and me, too.)
>
> Looking at my registration letter for 'unicode', I think it isn't the
> very casing, but the language I use about the casing that is possibly
> confusing:
>
> ''' NB! Alias: At the time of this registration, the spec upon which
>    the registration of the 'unicode' and the 'unicodeFFFE' charset is
>    based, defines 'utf-16' (lowercase) as alias for 'unicode'.[2] '''
>
> If I remove the '(lowercase)', then the above should be clear enough,
> no? Also: In the same letter I say that 'utf-16' lowercase *cannot* be
> registered as alias for 'unicode' due to the fact that 'UTF-16'
> uppercase is already registered as a charset name. So there is material
> enough to at least avoid jumping to conclusions ...

Can you go over your templates and check for these and similar places 
and fix and resend them?

>>> Thus I've tried to be consistent with the casing
>>> found in the IANA registry (UPPERCASE) and in the Microsoft listing
>>> (lowercase and mixedCASE).
>>
>> The fact that the IANA registry lists the charset labels with
>> uppercase characters isn't more than a random convention, and this
>> may also be so for the Microsoft listings.
>
> The registry says: ''However, no distinction is made between use of
> upper and lower case letters.'' I read this to mean that products are
> not supposed to make distinctions based on the casing.

Yes indeed.

> Hence I assumed
> this mailing list would read *me* the same way ...

Well, in general, it would. But you were so consistent in your case 
distinctions and were talking about all kinds of edge cases, and that 
made at least Shawn and me, and probably others, think that there really 
was a case distinction.

> Am I wrong w.r.t
> the registry? Does the casing in the registry matter, except for
> registry conventions or sorts?

No, casing doesn't matter for charset labels, neither in the registry, 
nor, as Shawn and you have fortunately confirmed, in any implementations 
we know of.

>> Please use a single case
>> version unless case is really significant in the sense that one and
>> the same product, in one and the same protocol slot, reacts different
>> to different case forms.
>
> When I quote Microsoft or the IANA registry, I must of course use the
> casing used in those documents.

If you quote a whole sentence, then probably yes. But not when just 
quoting a label, or when just using information from these places.

> But - OK - else: Until I eventually
> discover a case where the casing matters, I will try to use the a
> single casing.

Great, thanks.

>>> This consistency have the following benefits:
>>>
>>> * It makes it easier to separate the semantics of the utf-16 alias
>>>     in the Microsoft listing from the semantics of the UTF-16 name in
>>>     the IANA registry.
>>
>> For all intents and purposes, these are one and the same charset
>> label. If you want to distinguish them, please do so with additional
>> words, not with case.
>
> I did that: I said 'utf-16 *alias*' versus 'UTF-16 *name*' ...  But the
> casing apparently nullified the effect ...

Yes, and "alias" vs. "name" isn't very direct either. I'd personally use 
"UTF-16 according to RFC..." vs. "UTF-16 according to Microsoft" or some 
such.

Regards,   Martin.

>>> * It separates 'unicode' from the trademarked/registered 'UNICODE'.
>>> * The casing 'unicodeFFFE' is more readable than 'unicodefffe' or
>>>     'UNICODEFFFE'.
>>
>> For these two, you have only used a single casing, so there's no
>> confusion. So these should be fine.
>
> Good.