[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Comments on draft-yergeau-rfc2279bis-00.txt

To: Dan Oscarsson <Dan.Oscarsson@trab.se>, ietf-charsets@iana.org,FYergeau@alis.com
Subject: Re: Comments on draft-yergeau-rfc2279bis-00.txt
From: Martin Duerst <duerst@w3.org>
Date: Thu, 18 Apr 2002 13:42:26 +0900
In-reply-to: <3CBD6007.653586A@trab.se>

While I'm definitely an advocate of NFC, this isn't and
should not be part of the definition of UTF-8.
Maybe Francois finds a good place to put in a pointer
to NFC and UAX #15, but it definitely shoudn't be part
of the normative definition.

Regards,   Martin.

At 13:44 02/04/17 +0200, Dan Oscarsson wrote:

>I would also very much like UTF-8 to require that Unicode
>normalisation form C has been used on the UCS encoded.
>Otherwise can the same character sequence have
>different UTF-8 codings.
>While it is no problem to use overlong UTF-8 sequences, they
>are forbidden in the document. This makes it impossible to
>encode the same ASCII character sequence in several ways.
>The same should be applied to all characters in UCS - only
>one form should be allowed.
>As form C do not destroy any data and is most compact, it is
>the best choice.
>So UTF-8 should REQUIRE the characters to be normalised
>using form C. (note: text normalised using from KC will
>work also, it it is normalised using form C it will result
>in the same text).
>
>Having both BOM removed and form C required will make handling
>of UTF-8 in software much simpler as well as less error and security
>prone.
>
>     Dan

References:
- Re: Comments on draft-yergeau-rfc2279bis-00.txt
  - From: Dan Oscarsson <Dan.Oscarsson@trab.se>

Prev by Date: RE: Comments on draft-yergeau-rfc2279bis-00.txt
Next by Date: Fixing redirects for 'character-sets' directory
Prev by thread: Re: Comments on draft-yergeau-rfc2279bis-00.txt
Next by thread: RE: Comments on draft-yergeau-rfc2279bis-00.txt
Index(es):
- Date
- Thread