[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registering a charset alias

To: Shawn Steele <Shawn.Steele@microsoft.com>
Subject: Re: Registering a charset alias
From: Erik van der Poel <erikv@google.com>
Date: Fri, 14 Aug 2009 16:55:36 -0700
Cc: Markus Scherer <markus.icu@gmail.com>,Ira McDonald <blueroofmusic@gmail.com>, Anne van Kesteren <annevk@opera.com>,ietf-charsetsianaorg <ietf-charsets@iana.org>
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=google.com; s=beta;t=1250294142; bh=+1P15WkDpGf073AOvR6S/PAWaUc=;h=DomainKey-Signature:MIME-Version:In-Reply-To:References:Date:Message-ID:Subject:From:To:Cc:Content-Type:Content-Transfer-Encoding:X-System-Of-Record; b=rJdtMquGF979Iq3IFxo5Ynm7EakpAlJCoY6qfDpwiuILX+Dy6kTidyL7QErQcv+36Zot/lWarEjBYOsn0KU56 g==
DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns;h=mime-version:in-reply-to:references:date:message-id:subject:from:to:cc:content-type:content-transfer-encoding:x-system-of-record;b=rg0FrUfrMVujBHGEyyHHIJYAPqqqf20OizTirVjn3VlmBd4qO74mIWGfmU0VNSBK+ZwebIpYAlL28LZN3VDY8g==
In-reply-to: <CAD7705D4A93814F97D3EF00790AF0B31603127E@tk5ex14mbxc105.redmond.corp.microsoft.com>
List-Id: <ietf-charsets.mail.apps.ietf.org>
List-Owner: <mailto:ietf-charsets-owner@mail.apps.ietf.org>
List-Subscribe: <mailto:mailserv@mail.apps.ietf.org?subject=subscribe%20ietf-charsets>
List-Unsubscribe: <mailto:mailserv@mail.apps.ietf.org?subject=unsubscribe%20ietf-charsets>
Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
References: <op.uyl5bcjb64w2qv@annevk-t60><e395be80908131614p2e6ccb69u6bac9de57bc0f3d@mail.gmail.com><c07a32650908131856k44cbb0dcg129c64ffd57336e5@mail.gmail.com><CAD7705D4A93814F97D3EF00790AF0B316030FE6@tk5ex14mbxc105.redmond.corp.microsoft.com><c07a32650908141405lafcb236n98aec273dc45ff49@mail.gmail.com><CAD7705D4A93814F97D3EF00790AF0B31603105A@tk5ex14mbxc105.redmond.corp.microsoft.com><c07a32650908141549v103ae000qfd9e013ccb164ea8@mail.gmail.com><6bb028490908141603s5805ae6et6d486e7f3df5ca6@mail.gmail.com><c07a32650908141617x607895e3yaac4f86be795a1b9@mail.gmail.com><CAD7705D4A93814F97D3EF00790AF0B31603127E@tk5ex14mbxc105.redmond.corp.microsoft.com>
Spam-test: False ; 0.8 / 4.5 ; RDNS_NONE,SPF_SOFTFAIL

I agree with you, but I think we're talking past each other.

We should recommend UTF-8 for *output* (i.e. authors, servers, etc),
but, in order to make it easier for new implementations, we should
also document the names that are accepted for *input*. This is what
Anne was referring to (the burden of reverse-engineering).

(We don't even need to try very hard to recommend UTF-8, because the
world is already moving in that direction, at least on the Web.)

Erik

On Fri, Aug 14, 2009 at 4:49 PM, Shawn Steele<Shawn.Steele@microsoft.com> wrote:
> I'd encourage moving toward UTF-8.  Assuming that existing stuff works OK, then adding "better" ways for non-Unicode data to be passed around doesn't accomplish much.  Anyone working to use the "better" method could just as easily (or more easily) just move to UTF-8.
>
> -Shawn
>
> -----Original Message-----
> From: Erik van der Poel [mailto:erikv@google.com]
> Sent: Friday, August 14,  2009 16:18
> To: Markus Scherer
> Cc: Shawn Steele; Ira McDonald; Anne van Kesteren; ietf-charsetsianaorg
> Subject: Re: Registering a charset alias
>
> No, I don't think we should recommend behavior that is more lenient
> than what the major browsers currently do. (I believe the major
> browsers don't strip "x-"?)
>
> So I don't think the following spec from HTML 5, section 2.7 is very
> good either:
>
> "When comparing a string specifying a character encoding with the name
> or alias of a character encoding to determine if they are equal, user
> agents must use the Charset Alias Matching rules defined in Unicode
> Technical Standard #22. [UTS22]
>
> For instance, "GB_2312-80" and "g.b.2312(80)" are considered equivalent names."
>
> The general approach should be: As lenient as the major browsers, but
> not more lenient. Lenience leads to a proliferation of garbage.
>
> I chose the name x-x-big5 for an internal X-Windows-only encoding for
> Big5 that only had 2-byte characters, no ASCIIs. That name was
> intended to be used only internally, within Netscape, but our X
> resource file was written in plain ASCII, and Microsoft picked that
> up, assuming that x-x-big5 was ordinary big5. I shouldn't have exposed
> that name in a plain text file, nor should I have put that name in the
> same namespace as ordinary charsets.
>
> Erik
>
> On Fri, Aug 14, 2009 at 4:03 PM, Markus Scherer<markus.icu@gmail.com> wrote:
>> How about a general rule (maybe in HTML 5) that if x-abc is not recognized
>> then the implementation should strip the x- and try abc instead. Apparently,
>> this may need to be done multiple times, to deal with x-x-big5. (Whoever
>> came up with *that*?)
>>
>> markus
>>
>
>

References:
- Registering a charset alias
  - From: Anne van Kesteren <annevk@opera.com>
- Re: Registering a charset alias
  - From: Ira McDonald <blueroofmusic@gmail.com>
- Re: Registering a charset alias
  - From: Erik van der Poel <erikv@google.com>
- RE: Registering a charset alias
  - From: Shawn Steele <Shawn.Steele@microsoft.com>
- Re: Registering a charset alias
  - From: Erik van der Poel <erikv@google.com>
- RE: Registering a charset alias
  - From: Shawn Steele <Shawn.Steele@microsoft.com>
- Re: Registering a charset alias
  - From: Erik van der Poel <erikv@google.com>
- Re: Registering a charset alias
  - From: Markus Scherer <markus.icu@gmail.com>
- Re: Registering a charset alias
  - From: Erik van der Poel <erikv@google.com>
- RE: Registering a charset alias
  - From: Shawn Steele <Shawn.Steele@microsoft.com>

Prev by Date: RE: Registering a charset alias
Next by Date: Re: Registering a charset alias
Prev by thread: RE: Registering a charset alias
Next by thread: Re: Registering a charset alias
Index(es):
- Date
- Thread