[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposed changes to UTF-8 draft
On Mon, Jan 13, 2003 at 10:46:04AM -0500, Francois Yergeau wrote:
> Keld Jørn Simonsen wrote:
> > It is becacuse UTF-8 in the ISO 10646 definition only encodes
> > characters
> > defined in 10646. And "surrogates" are not characters. So they "do not
> > occur" in UTF-8.
>
> Yes, you're just repeating what the Note in Annex D says. It's not wrong.
> It's just insufficient: it's a Note (non-normative) and it does not forbid
> (or even warn against) interpreting encoded surrogates. Or overlong
> sequences. There is a section that describes certain error cases, but it
> misses those two, thereby implying that they might not be errors. The
> Unicode 3.2 text is just much tighter (at long last!) and therefore should
> be chosen.
That is not how I read it, the note explains what is obvious from the
architecture, to the reader, that you cannot encode surrogates in utf-8.
It does not, however, warn against overlong sequences, that is true.
Kind regards
keld