[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registering a charset alias



I agree with you, but I think we're talking past each other.

We should recommend UTF-8 for *output* (i.e. authors, servers, etc),
but, in order to make it easier for new implementations, we should
also document the names that are accepted for *input*. This is what
Anne was referring to (the burden of reverse-engineering).

(We don't even need to try very hard to recommend UTF-8, because the
world is already moving in that direction, at least on the Web.)

Erik

On Fri, Aug 14, 2009 at 4:49 PM, Shawn Steele<Shawn.Steele@microsoft.com> wrote:
> I'd encourage moving toward UTF-8.  Assuming that existing stuff works OK, then adding "better" ways for non-Unicode data to be passed around doesn't accomplish much.  Anyone working to use the "better" method could just as easily (or more easily) just move to UTF-8.
>
> -Shawn
>
> -----Original Message-----
> From: Erik van der Poel [mailto:erikv@google.com]
> Sent: Friday, August 14,  2009 16:18
> To: Markus Scherer
> Cc: Shawn Steele; Ira McDonald; Anne van Kesteren; ietf-charsetsianaorg
> Subject: Re: Registering a charset alias
>
> No, I don't think we should recommend behavior that is more lenient
> than what the major browsers currently do. (I believe the major
> browsers don't strip "x-"?)
>
> So I don't think the following spec from HTML 5, section 2.7 is very
> good either:
>
> "When comparing a string specifying a character encoding with the name
> or alias of a character encoding to determine if they are equal, user
> agents must use the Charset Alias Matching rules defined in Unicode
> Technical Standard #22. [UTS22]
>
> For instance, "GB_2312-80" and "g.b.2312(80)" are considered equivalent names."
>
> The general approach should be: As lenient as the major browsers, but
> not more lenient. Lenience leads to a proliferation of garbage.
>
> I chose the name x-x-big5 for an internal X-Windows-only encoding for
> Big5 that only had 2-byte characters, no ASCIIs. That name was
> intended to be used only internally, within Netscape, but our X
> resource file was written in plain ASCII, and Microsoft picked that
> up, assuming that x-x-big5 was ordinary big5. I shouldn't have exposed
> that name in a plain text file, nor should I have put that name in the
> same namespace as ordinary charsets.
>
> Erik
>
> On Fri, Aug 14, 2009 at 4:03 PM, Markus Scherer<markus.icu@gmail.com> wrote:
>> How about a general rule (maybe in HTML 5) that if x-abc is not recognized
>> then the implementation should strip the x- and try abc instead. Apparently,
>> this may need to be done multiple times, to deal with x-x-big5. (Whoever
>> came up with *that*?)
>>
>> markus
>>
>
>