[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
draft-yergeau-utf8-rev2-00 - unused octet values
- To: charsets <[email protected]>
- Subject: draft-yergeau-utf8-rev2-00 - unused octet values
- From: Markus Scherer <[email protected]>
- Date: Tue, 16 Apr 2002 13:01:00 -0700
- Organization: IBM
- References: <[email protected]>
- User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4)Gecko/20011019 Netscape6/6.2
I would like to add to the Introduction, paragraph 11:
Current: "The octet values FE and FF never appear."
Suggested:
"The octet values C0, C1, FE and FF never appear.
If the repertoire is restricted to the range U+0000 to U+10FFFF (the Unicode repertoire),
then the octet values F5..FD also never appear."
Explanation: The C0 and C1 lead octets must never be used because of the shortest-form rule.
Comment on section 2 Notational conventions, paragraph 18:
4 to 6 hex digits are only enough if the repertoire is restricted to the Unicode range...
I am not sure if/how this needs rephrasing to consider the full (but never used) 10646 range (up to 8 digits).
[Personally, I would restrict UTF-8 outright to 10FFFF, but I said this already and was voted down :-]
markus