[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Registration of new charset 'unicodeFFFE'

To: ietf-charsets@iana.org
Subject: Registration of new charset 'unicodeFFFE'
From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Thu, 15 Dec 2011 12:19:49 +0100
List-Id: <ietf-charsets.mail.apps.ietf.org>
List-Owner: <mailto:ietf-charsets-owner@mail.apps.ietf.org>
List-Subscribe: <mailto:mailserv@mail.apps.ietf.org?subject=subscribe%20ietf-charsets>
List-Unsubscribe: <mailto:mailserv@mail.apps.ietf.org?subject=unsubscribe%20ietf-charsets>
Organization: =?utf-8?B?TcOlbGZvcm0ubm8=?=
Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
Spam-test: False ; 0.0 / 4.5

Charset name: 
      unicodeFFFE

Charset aliases:
      No aliases.

Suitability for use in MIME text:
      The 'unicodeFFFE' charset labels the big-endian 'subset' of
      'UTF-16' and thus shares the same issue: It does 'not encode
      line endings in the way required for MIME "text" media'. 
  [1] http://tools.ietf.org/rfc/rfc2781.txt

Published specification(s):
      The 'unicodeFFFE' charset label covers 'codepage 1201':
  [2] http://msdn.microsoft.com/en-us/library/aa752010(v=VS.85).aspx
      Codepage 1201 covers a big-endian representation of
      'UTF-16', including the BOM: 'Unicode UTF-16, big endian byte
      order; available only to managed applications'.
  [3] http://msdn.microsoft.com/en-us/library/dd317756(v=VS.85).aspx
      The reference to 'Unicode UTF-16' is taken to mean that the
      BOM MUST be present.

ISO 10646 equivalency table:
      The 'unicodeFFFE' charset (codepage 1201) is the big-endian 
  equivalent to 'unicode' (codepage 1200), which in turn represents
  'BMP of ISO 10646'.[2] Thus 'unicodeFFFE' is equivalent of the BMP.

Additional information: 
      The 'unicodeFFFE' charset can be understood as the big-endian 
  'subset' of 'UTF-16'. Thus, like 'UTF-16'-encoded resources, 
  'unicodeFFFE'-encoded resources include the BOM: If the resource 
  doesn't contain a BOM, then it isn't 'unicodeFFFE'-encoded. 
  Applications generating resources with the 'unicodeFFFE' label on
  (example: <META content="text/html; charset=unicodeFFFE" 
  http-equiv=Content-Type>), are known to insert the BOM. When parsing
  e.g. media of MIME type 'text/html', then Internet Explorer is known
  to NOT pick 'unicodeFFFE' (or any other of the 16-bit UTF variants) 
  as the encoding unless there is a BOM. (Minor exception for 
  'text/html': If the HTTP Content-Type: header contains 'unicodeFFFE'
  in the charset parameter, then IE renders the 'text/html' resource 
  fine even without a BOM - but only as long as the resource isn't 
  loaded from cache.)
     NB! Alias: At the time of this registration, the spec upon which
  the registration of the 'unicodeFFFE' and the 'unicode' charset is
  based, defines 'utf-16' (lowercase) as alias for 'unicode'.[2] 
  This is incompatible with the registered semantics of (uppercase) 
  'UTF-16' (RFC2781) as it causes implementations - such as Internet
  Explorer (IE) - to interpret 'utf-16' (irrespective of case) to mean
  'little-endian'. Usually, because a BOM takes precedence (the BOM is
  a MUST for both 'unicode', 'unicodeFFFE' and 'UTF-16'), the problem is
  solved by the BOM. But otherwise, unless implementations adheres to 
  the 'unicode'-registration and thus rejects 'utf-16' as alias for
  'unicode', then big-endian MIME text resources that are labelled as 
  'UTF-16' risk being mis-rendered (causing 'mojibake').

Intended usage:
      LIMITED USE. It is used by a large community of Microsoft product 
users, but is also supported, across different platforms, by products 
that want to be compatible. By 'compatible' is meant e.g. tools, such 
as editors, in need of determining the encoding or advice about the 
best charset label. In that regard: Any resource that can be validly 
labeled as 'unicodeFFFE' could also validly (and probably ought to) be 
labelled as 'UTF-16'. Another example is the encoding sniffing 
algorithm of HTML5, which in certain circumstances require charset 
labels that contain 'a UTF-16 encoding' (such as 'unicodeFFFE') as its 
value, to be interpreted as if its value instead was 'UTF-8'.

   Person & email address to contact for further information: 
      Leif Halvard Silli, xn--mlform-iua&xn--mlform-iua.no

Prev by Date: Re: How to register 'unicode'/'unicodeFFFE' ?
Next by Date: Re: How to register 'unicode'/'unicodeFFFE' ?
Prev by thread: RE: Are charset names supposed to be case sensitive?
Next by thread: How to register 'unicode'/'unicodeFFFE' ?
Index(es):
- Date
- Thread