[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Proposed changes to UTF-8 draft



Keld Jørn Simonsen wrote:
> I think we should keep ourselves to open standards whenever possible,
> and avoid industry standards like Unicode if we can.

I dispute the characterization of ISO standards as open.  The
standardization process is totally closed (only National Bodies can play)
and the standards themselves, with few exceptions not including 10646, are
available only for money.
 
> 10646 is pretty explicit about not using surrogates in UTF-8,
> as far as I know. Always was.

Please re-read Annex D.  The only mention is this Note:

  NOTE 1 - Values of x in the range 0000 D800 .. 0000 DFFF
  are reserved for the UTF-16 form and do not occur in UCS-4.
  The values 0000 FFFE and 0000 FFFF also do not occur
  (see clause 8). The mappings of these code positions in
  UTF-8 are undefined.

There's a later section D.7 "Incorrect sequences of octets: Interpretation
by receiving devices" which is totally silent on decoding surrogates and
overlong sequences.

-- 
François