[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Encodings and the web

To: ietf-charsets <ietf-charsets@iana.org>
Subject: Encodings and the web
From: Anne van Kesteren <annevk@opera.com>
Date: Tue, 20 Dec 2011 11:59:49 +0100
List-Id: <ietf-charsets.mail.apps.ietf.org>
List-Owner: <mailto:ietf-charsets-owner@mail.apps.ietf.org>
List-Subscribe: <mailto:mailserv@mail.apps.ietf.org?subject=subscribe%20ietf-charsets>
List-Unsubscribe: <mailto:mailserv@mail.apps.ietf.org?subject=unsubscribe%20ietf-charsets>
Organization: Opera Software
Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
Spam-test: False ; 0.7 / 4.5 ; SPF_SOFTFAIL
User-Agent: Opera Mail/11.60 (MacIntel)

Hi,

When doing research into encodings as implemented by popular user agents I
have found the current standards lacking. In particular:

    * More encodings in the registry than needed for the web
    * Error handling for encodings is undefined (can lead to XSS exploits,
      also gives interoperability problems)
    * Often encodings are implemented differently from the standard

A year ago I did some research into encodings[1] and more detailed for
single-octet encodings[2] and I have now taken that further into starting
to define a standard[3] for encodings as they are to be implemented by
user agents. The current scope is roughly defining the encodings, their
labels and name, and how you match a label.

The goal is to unify encoding handling across user agents for the web so
legacy pages can be interpreted "correctly" (i.e. as expected by users).

If you are interested in helping out testing (and reverse engineering)
multi-octet encodings please let me know. Any other input is much
appreciated as well.

Kind regards,


[1]<http://wiki.whatwg.org/wiki/Web_Encodings>
[2]<http://annevankesteren.nl/2010/12/encodings-labels-tested>
[3]<http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html>


-- 
Anne van Kesteren
http://annevankesteren.nl/

Follow-Ups:
- Re: Encodings and the web
  - From: =?UTF-8?B?Ik1hcnRpbiBKLiBEw7xyc3Qi?= <duerst@it.aoyama.ac.jp>
- Re: Encodings and the web
  - From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Re: Encodings and the web
  - From: Anne van Kesteren <annevk@opera.com>

Prev by Date: Re: Registration of new charset 'unicode'
Next by Date: RE: Are charset names supposed to be case sensitive?
Prev by thread: RE: Encoding Standard (mostly complete)
Next by thread: Re: Encodings and the web
Index(es):
- Date
- Thread