[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Indicating charset variants (was: RE: windows 936)



Martin Duerst wrote:
 
> XML spoils things.

That's bad.  There are quite a lot registrations with names
in the form IBM00858 or IBM01140.  I'd guess that nobody
will use these names with leading zeros, and sticks to the
whatever+euro alias, e.g. pc-multilingual-850+euro.

On the platforms where this charset is used its local name
is "codepage 850", the preferred MIME name should include
850, not the obscure 00858 or 858.

>| EncName    ::=          [A-Za-z] ([A-Za-z0-9._] | '-')*

Ugh, that's really bad.
 
> Now there would be three ways ahead:
> - Ignore XML. I don't think we want to go there.

True.

> - Try to change XML. A few years ago, that would have been
>   easy with an erratum, but I don't think this will be met
>   with cheers these days.

Interoperability is more important.  They have already fixed
xml:lang to allow empty values, they should fix SystemLiteral
to be a valid URI, and while they're at it maybe updating the
EncName is a better option than...

> - Choose a separator different from '+'. After quite a bit of
>   thinking, I have reached the conclusion that the obvious
>   thing to do would be to use something like '--'.

...registering new aliases as preferred MIME names for the
various existing whatever+euro entries.

> What does everybody think?

Either fix XML or the registry for cases like ...850+euro.

 Frank