[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Proposed changes to UTF-8 draft



Keld Jørn Simonsen wrote:
> It is becacuse UTF-8 in the ISO 10646 definition only encodes 
> characters
> defined in 10646. And "surrogates" are not characters. So they "do not
> occur" in UTF-8. 

Yes, you're just repeating what the Note in Annex D says.  It's not wrong.
It's just insufficient: it's a Note (non-normative) and it does not forbid
(or even warn against) interpreting encoded surrogates.  Or overlong
sequences.  There is a section that describes certain error cases, but it
misses those two, thereby implying that they might not be errors.  The
Unicode 3.2 text is just much tighter (at long last!) and therefore should
be chosen.

-- 
François