[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
draft-yergeau-utf8-rev2-00 - unused octet values
I would like to add to the Introduction, paragraph 11:
Current: "The octet values FE and FF never appear."
Suggested:
"The octet values C0, C1, FE and FF never appear.
If the repertoire is restricted to the range U+0000 to U+10FFFF (the Unicode repertoire),
then the octet values F5..FD also never appear."
Explanation: The C0 and C1 lead octets must never be used because of the shortest-form rule.
Comment on section 2 Notational conventions, paragraph 18:
4 to 6 hex digits are only enough if the repertoire is restricted to the Unicode range...
I am not sure if/how this needs rephrasing to consider the full (but never used) 10646 range (up to 8 digits).
[Personally, I would restrict UTF-8 outright to 10FFFF, but I said this already and was voted down :-]
markus