[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Comments on draft-yergeau-rfc2279bis-00.txt




On 17/04/2002 21:51:19 Francois Yergeau wrote:
> Martin Duerst wrote:
[...]

> > 5. Byte order mark (BOM)
> >
> > This section needs more work. The 'change log' says that it's
> > mostly taken from the UTF-16 RFC. But the BOM for UTF-8 is
> > much less necessary, and much more of a problem, than for UTF-16.
> > We should clearly say that with IETF protocols, character encodings
> > are always either labeled or fixed, and therefore the BOM SHOULD
> > (and MUST at least for small segments) never be used for UTF-8.
> > And we should clearly give the main argument, namely that it
> > breaks US-ASCII compatibility (US-ASCII encoded as UTF-8
> > (without a BOM) stays exactly the same, but US-ASCII encoded
> > as UTF-8 with a BOM is different).
>
> I don't quite see your point.  A US-ASCII string, with or without a BOM, is
> always a valid UTF-8 string, I don't see where compatibility is broken.  I
> can see that protocols shouldn't *require* a BOM, because then a strict
> (BOM-less) ASCII string wouldn't meet the requirement.  But that's not what
> you're saying, right?

The point Martin may be making is that some tools insert a BOM
at the start of a resource they consider to be encoded using
UTF-8, but do not do so for a resource they consider to be
encoded using US-ASCII.

I have just carried out the following test.  I opened Notepad
under Win2K and typed the letter "a".  I then saved the file,
leaving the default encoding of "ANSI".  I then saved the file
again, under a different name, specifying "UTF-8" as the
encoding.  I then checked the file sizes using Properties.
The first file is 1 byte long; the second 4 bytes.

Misha

[...]





------------------------------------------------------------- ---
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.