[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: New draft-yergeau-rfc2279bis-05.txt
Hi Francois,
Which reminds me that the recently published RFC 3454 (December 2002)
is based on Unicode/3.2 (of course). But there are (I believe) some
new characters registered in Unicode/4.0. Also, Markus Kuhn's good
point recently on Linux I18N list that the character class of
SOFT-HYPHEN just changed in Unicode/4.0 (which affects Stringprep).
Since a lot of IETF WGs are doing Stringprep profiles, it would be
desirable that they were referencing Unicode/4.0 - thus new exclusions
tables are needed, for example.
Comments?
Cheers,
- Ira McDonald
High North Inc
-----Original Message-----
From: Francois Yergeau [mailto:FYergeau@alis.com]
Sent: Monday, June 09, 2003 4:55 PM
To: ietf-charsets@iana.org
Subject: RE: New draft-yergeau-rfc2279bis-05.txt
I forgot to mention that I also updated the [UNICODE] reference to Unicode
4.0.
--
François Yergeau
> -----Message d'origine-----
> De : Francois Yergeau [mailto:FYergeau@alis.com]
> Envoyé : 9 juin 2003 16:10
> À : ietf-charsets@iana.org
> Objet : New draft-yergeau-rfc2279bis-05.txt
>
>
> ...just submitted to secretariat.
>
> This revision addresses two substantive issues raised by the
> IESG during
> post-last-call evaluation, as well as a few minor points that
> have shown up
> since -04.
>
> Changes from IESG review:
> ==============================================================
> ==============
> =
>
> One director requested that it be made clear that the ABNF in
> section 4 is
> not normative, both because it is new and untested -- added
> between Draft
> and Standard -- and because RFC 2234 is only Proposed.
> Section 4 now begins
> with a new para:
>
> For the convenience of implementors using ABNF, a
> definition of UTF-8
> in ABNF syntax is given here.
>
> and ends with a new Note:
>
> NOTE -- The authoritative definition of UTF-8 is in [UNICODE]. This
> grammar is believed to describe the same thing as what Unicode
> describes, but does not claim to be authoritative. Implementors are
> urged to rely on the authoritative source, rather than on
> this ABNF.
>
> ==============================================================
> ==============
> =
>
> One director requested additional material in Security
> Considerations about
> the fact that octet-by-octet comparison is not sufficient (the Unicode
> normalization issue). The following has been added at the
> end of section
> 10:
>
> Security may also be impacted by a characteristic of several
> character encodings, including UTF-8: the "same thing" (as far as a
> user can tell) can be represented by several distinct character
> sequences. For instance, an e with acute accent can be
> represented by
> the precomposed U+00E9 E ACUTE character or by the canonically
> equivalent sequence U+0065 U+0301 (E + COMBINING ACUTE).
> Even though
> UTF-8 provides a single byte sequence for each character sequence,
> the existence of multiple character sequences for "the same thing"
> may have security consequences whenever string matching, indexing,
> searching, sorting, regular expression matching and selection are
> involved. An example would be string matching of an identifier
> appearing in a credential and in access control list entries. This
> issue is amenable to solutions based on Unicode
> Normalization Forms,
> see [UAX15].
>
> together with a new entry in Informative references for
> "Unicode Standard
> Annex #15: Unicode Normalization Forms".
>
>
> Minor changes:
> ==============================================================
> ==============
> =
>
> In Introduction, add "code position" to "(the character
> number, a.k.a. code
> point or Unicode scalar value)".
>
> Rationale: "code position" is the 10646 term.
>
> ==============================================================
> ==============
> =
>
> In Introduction, change
>
> o The octet values C0, C1, FE and FF never appear. If the range of
> character numbers is restricted to U+0000..U+10FFFF (the UTF-16
> accessible range), then the octet values F5..FD also
> never appear.
>
> to
>
> o The octet values C0, C1, and F5 to FF never appear.
>
> Rationale: we do restrict to U+0000..U+10FFFF now, the "If"
> is superfluous.
>
> ==============================================================
> ==============
> =
>
> In Introduction, add "byte-value" to "The lexicographic
> sorting order of..."
>
> Rationale: clarification, that's what it is.
>
> ==============================================================
> ==============
> =
>
> Add Chris Newman to Acknowlegments
>
> Rationale: he had just slipped through the cracks. With apologies.
>
> ==============================================================
> ==============
> =
>
> --
> François Yergeau
>