[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Encoding Standard (mostly complete)
On Tue, 17 Apr 2012 20:20:34 +0200, Doug Ewell <doug@ewellic.org> wrote:
> Shawn Steele <Shawn dot Steele at microsoft dot com> wrote:
>> I'm a little confused about what the purpose of the document is?
>
> I assume it was intended to document the encodings deemed permissible in
> HTML5, which I guess is supposed to be synonymous with "the web
> platform."
More or less, yes. Encodings to be used by HTML, CSS, browser
implementations of XML, etc. As I explained before on this mailing list
http://mail.apps.ietf.org/ietf/charsets/msg02027.html the idea is to:
* Make the encodings that can be supported a finite list
* Carefully define the labels for these encodings
* Carefully define the algorithms to implement these encodings
** Including error and end-of-file handling
* Carefully define the indexes for these encodings, including any poorly
documented extensions
The idea is to make the web platform completely predictable with respect
to encodings rather than the morass it is now. This should help existing
implementations compete more effectively as well as help new
implementations enter the market more easily without significant reverse
engineering costs.
> I was surprised by some of the choices of "permissible," such as
> including ibm864 and ibm866 but none of the other, much more widespread,
> legacy OEM code pages. I was also puzzled by the reference to utf-16 and
> utf-16be as "legacy" encodings.
I'm not quite sure if ibm864 and ibm866 should stay, they are not
universally supported but four out of five user agents have them if I
remember correctly. The list of encodings is based roughly on the
intersection of what browsers support. If I missed an encoding that is
actually "widely" used on pages it would be good to add it of course. My
assumption has been that if only one browser supports the encoding it is
probably not or not widely used.
I classified utf-16 as legacy because of its many gotchas and because most
web technology works entirely with utf-8 or does not work with utf-16.
E.g. form submission does not do utf-16, XMLHttpRequest only sends utf-8
encoded strings, several new formats are utf-8 only.
--
Anne van Kesteren
http://annevankesteren.nl/