[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: draft-yergeau-utf8-rev2-00 - unused octet values



At 13:01 02/04/16 -0700, Markus Scherer wrote:
>I would like to add to the Introduction, paragraph 11:
>
>Current: "The octet values FE and FF never appear."
>
>Suggested:
>"The octet values C0, C1, FE and FF never appear.
>If the repertoire is restricted to the range U+0000 to U+10FFFF (the 
>Unicode repertoire),
>then the octet values F5..FD also never appear."
>
>Explanation: The C0 and C1 lead octets must never be used because of the 
>shortest-form rule.

Very good point.



>Comment on section 2 Notational conventions, paragraph 18:
>4 to 6 hex digits are only enough if the repertoire is restricted to the 
>Unicode range...
>I am not sure if/how this needs rephrasing to consider the full (but never 
>used) 10646 range (up to 8 digits).

Well, this is not about the notation in general, but only about
the notation in this document. And the document doesn't contain
any examples that would need more digits, so we are fine.

Regards,   Martin.


>[Personally, I would restrict UTF-8 outright to 10FFFF, but I said this 
>already and was voted down :-]
>
>markus
>