[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding Standard (mostly complete)

To: Bjoern Hoehrmann <derhoermi@gmx.net>
Subject: Re: Encoding Standard (mostly complete)
From: Anne van Kesteren <annevk@opera.com>
Date: Wed, 18 Apr 2012 08:24:13 +0200
Cc: ietf-charsets <ietf-charsets@iana.org>
In-reply-to: <c8iso7p2be97oohvd9v8kcim9njo1ds86n@hive.bjoern.hoehrmann.de>
List-Id: <ietf-charsets.mail.apps.ietf.org>
List-Owner: <mailto:ietf-charsets-owner@mail.apps.ietf.org>
List-Subscribe: <mailto:mailserv@mail.apps.ietf.org?subject=subscribe%20ietf-charsets>
List-Unsubscribe: <mailto:mailserv@mail.apps.ietf.org?subject=unsubscribe%20ietf-charsets>
Organization: Opera Software
Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
References: <op.wcwk2xxc64w2qv@annevk-macbookpro.local><c8iso7p2be97oohvd9v8kcim9njo1ds86n@hive.bjoern.hoehrmann.de>
Spam-test: False ; 1.0 / 4.5 ; SPF_SOFTFAIL
User-Agent: Opera Mail/11.62 (MacIntel)

On Wed, 18 Apr 2012 07:11:14 +0200, Bjoern Hoehrmann <derhoermi@gmx.net>  
wrote:
> * Anne van Kesteren wrote:
>> Apart from big5, all encoders and decoders are now defined.
>>
>> http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html
>
> What is your reasoning behind "defining" how to decode UTF-8?

The idea is to remove the need for
http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#utf-8


> It seems
> to me this is well understood and does not require yet another speci-
> fication. Anyone wanting to implement a UTF-8 decoder would have to
> compare your proposal to the other specifications to see if there are
> any differences, and if there are any differences, find out or decide
> if that's due to errors in your specification, and whether they want to
> adopt your specification rather than any of the others. That's not a
> good use of anyone's resources.

I agree, but referencing another specification and then trying to  
carefully subset it because it does not do what you want does not seem  
ideal either. And it would be inconsistent with the rest of the standard.

I don't really mind changing this though. We could do something like what  
HTML has done instead, but it just seems rather messy to me.


> I don't feel like reverse-engineering your assembly code and clicking
> through half a dozen of definitions to confirm this, but it seems as
> though your decoder is rather buggy, there is nothing obvious for in-
> stance that would protect against overlong sequences.

"utf-8 lower boundary" takes care of that.


-- 
Anne van Kesteren
http://annevankesteren.nl/

References:
- Encoding Standard (mostly complete)
  - From: Anne van Kesteren <annevk@opera.com>
- Re: Encoding Standard (mostly complete)
  - From: Bjoern Hoehrmann <derhoermi@gmx.net>

Prev by Date: Re: Encoding Standard (mostly complete)
Next by Date: Re: Encoding Standard (mostly complete)
Prev by thread: Re: Encoding Standard (mostly complete)
Next by thread: Re: Encoding Standard (mostly complete)
Index(es):
- Date
- Thread