[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Comments on draft-yergeau-rfc2279bis-00.txt
- To: charsets <[email protected]>
- Subject: Re: Comments on draft-yergeau-rfc2279bis-00.txt
- From: Markus Scherer <[email protected]>
- Date: Thu, 17 Oct 2002 09:12:09 -0700
- Organization: IBM
- Original-recipient: rfc822;[email protected]
- References: <[email protected]>
- Spam-test: False ; 1.5 / 5.2
- User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4)Gecko/20011019 Netscape6/6.2
Patrik F�ltstr�m wrote:
> What I hear on this list is that the consensus is that BOM SHOULD NOT be
> used. I would like it to be MUST NOT be used in Internet protocols,
> which leads to tagged UTF-8 text be illegal if the BOM exists in the text.
That would violate the Unicode standard. If UTF-8 is clearly indicated with some charset label, then an initial sequence of ef bb bf must be interpreted as the character U+feff ZWNBSP. Since that is not a very useful character at the beginning of a text, it can usually be ignored.
Personally, I find Fran�ois' text very clear. It acknowledges existing, reasonable and useful practice.
Best regards,
markus
--
Opinions expressed here may not reflect my company's positions unless otherwise noted.