[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposed changes to UTF-8 draft

To: Francois Yergeau <[email protected]>
Subject: Re: Proposed changes to UTF-8 draft
From: Keld J�rn Simonsen <[email protected]>
Date: Mon, 13 Jan 2003 18:54:55 +0100
Cc: [email protected]
In-reply-to: <[email protected]>
Original-recipient: rfc822;[email protected]
References: <[email protected]>
Spam-test: False ; -3.4 / 5.2
User-Agent: Mutt/1.3.27i

On Mon, Jan 13, 2003 at 10:46:04AM -0500, Francois Yergeau wrote:
> Keld J�rn Simonsen wrote:
> > It is becacuse UTF-8 in the ISO 10646 definition only encodes 
> > characters
> > defined in 10646. And "surrogates" are not characters. So they "do not
> > occur" in UTF-8. 
> 
> Yes, you're just repeating what the Note in Annex D says.  It's not wrong.
> It's just insufficient: it's a Note (non-normative) and it does not forbid
> (or even warn against) interpreting encoded surrogates.  Or overlong
> sequences.  There is a section that describes certain error cases, but it
> misses those two, thereby implying that they might not be errors.  The
> Unicode 3.2 text is just much tighter (at long last!) and therefore should
> be chosen.

That is not how I read it, the note explains what is obvious from the
architecture, to the reader, that you cannot encode surrogates in utf-8.
It does not, however, warn against overlong sequences, that is true.

Kind regards
keld

References:
- RE: Proposed changes to UTF-8 draft
  - From: Francois Yergeau <[email protected]>

Prev by Date: RE: Proposed changes to UTF-8 draft
Next by Date: RE: Proposed changes to UTF-8 draft
Prev by thread: RE: Proposed changes to UTF-8 draft
Next by thread: RE: Proposed changes to UTF-8 draft
Index(es):
- Date
- Thread