[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
New draft-yergeau-rfc2279bis-05.txt
- To: ietf-charsets@iana.org
- Subject: New draft-yergeau-rfc2279bis-05.txt
- From: Francois Yergeau <FYergeau@alis.com>
- Date: Mon, 09 Jun 2003 16:09:37 -0400
- Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
- Spam-test: False ; 0.0 / 4.5
...just submitted to secretariat.
This revision addresses two substantive issues raised by the IESG during
post-last-call evaluation, as well as a few minor points that have shown up
since -04.
Changes from IESG review:
============================================================================
=
One director requested that it be made clear that the ABNF in section 4 is
not normative, both because it is new and untested -- added between Draft
and Standard -- and because RFC 2234 is only Proposed. Section 4 now begins
with a new para:
For the convenience of implementors using ABNF, a definition of UTF-8
in ABNF syntax is given here.
and ends with a new Note:
NOTE -- The authoritative definition of UTF-8 is in [UNICODE]. This
grammar is believed to describe the same thing as what Unicode
describes, but does not claim to be authoritative. Implementors are
urged to rely on the authoritative source, rather than on this ABNF.
============================================================================
=
One director requested additional material in Security Considerations about
the fact that octet-by-octet comparison is not sufficient (the Unicode
normalization issue). The following has been added at the end of section
10:
Security may also be impacted by a characteristic of several
character encodings, including UTF-8: the "same thing" (as far as a
user can tell) can be represented by several distinct character
sequences. For instance, an e with acute accent can be represented by
the precomposed U+00E9 E ACUTE character or by the canonically
equivalent sequence U+0065 U+0301 (E + COMBINING ACUTE). Even though
UTF-8 provides a single byte sequence for each character sequence,
the existence of multiple character sequences for "the same thing"
may have security consequences whenever string matching, indexing,
searching, sorting, regular expression matching and selection are
involved. An example would be string matching of an identifier
appearing in a credential and in access control list entries. This
issue is amenable to solutions based on Unicode Normalization Forms,
see [UAX15].
together with a new entry in Informative references for "Unicode Standard
Annex #15: Unicode Normalization Forms".
Minor changes:
============================================================================
=
In Introduction, add "code position" to "(the character number, a.k.a. code
point or Unicode scalar value)".
Rationale: "code position" is the 10646 term.
============================================================================
=
In Introduction, change
o The octet values C0, C1, FE and FF never appear. If the range of
character numbers is restricted to U+0000..U+10FFFF (the UTF-16
accessible range), then the octet values F5..FD also never appear.
to
o The octet values C0, C1, and F5 to FF never appear.
Rationale: we do restrict to U+0000..U+10FFFF now, the "If" is superfluous.
============================================================================
=
In Introduction, add "byte-value" to "The lexicographic sorting order of..."
Rationale: clarification, that's what it is.
============================================================================
=
Add Chris Newman to Acknowlegments
Rationale: he had just slipped through the cracks. With apologies.
============================================================================
=
--
François Yergeau