[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

UTF-16 (was: Re: Charset reviewer appointed)



At 09:02 98/07/29 +0200, Harald Tveit Alvestrand wrote:

> >At 13:42 98/07/27 +0200, Harald Tveit Alvestrand wrote:
> >
> >> The BOM is part of the charset that UTF-16 represents.
> >> Any application can say anything it wants to *further restricting*
> >> what characters can apply where; the part we couldn't tolerate
> >> was if XML insisted upon strings that were *illegal* in the registered
> >> UTF-16, yet calling the charset "UTF-16".

> What I was saying is that if XML states that all valid XML documents
> must start with the BOM, that's no more problematic than if HTML
> states that all valid HTML documents must start with <!DOCTYPE;
> this is part of the application, not part of the charset.
> 
> I'm not saying it's a good idea; I strongly suspect that it's not.
> But it does not need to have the consent of the charset registration.

What XML is currently stating is that all UTF-16 documents must start
with a BOM, and that this BOM is not part of the real XML document.

XML does not say anything about a BOM for UTF-8, but the whole text
(in particular http://www.w3.org/TR/REC-xml#charencoding) and
and the examples it gives (http://www.w3.org/TR/REC-xml#sec-guessing)
strongly suggest that such a thing was never even taken into any
kind of consideration (Makoto, please correct me if this is otherwise).

For all the other (legacy) encodings, putting in a BOM at the beginning
of the document wouldn't be impossible in theory (using "&#xFEFF;"),
but makes even less sense, and is definitely not required, nor would
it be considered correct XML.


Regards,   Martin.