[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

draft-yergeau-utf8-rev2-00 - unused octet values



I would like to add to the Introduction, paragraph 11:

Current: "The octet values FE and FF never appear."

Suggested:
"The octet values C0, C1, FE and FF never appear.
If the repertoire is restricted to the range U+0000 to U+10FFFF (the Unicode repertoire),
then the octet values F5..FD also never appear."

Explanation: The C0 and C1 lead octets must never be used because of the shortest-form rule.


Comment on section 2 Notational conventions, paragraph 18:
4 to 6 hex digits are only enough if the repertoire is restricted to the Unicode range...
I am not sure if/how this needs rephrasing to consider the full (but never used) 10646 range (up to 8 digits).

[Personally, I would restrict UTF-8 outright to 10FFFF, but I said this already and was voted down :-]

markus