[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
UTF-16 (was: Re: Charset reviewer appointed)
At 09:02 98/07/29 +0200, Harald Tveit Alvestrand wrote:
> >At 13:42 98/07/27 +0200, Harald Tveit Alvestrand wrote:
> >
> >> The BOM is part of the charset that UTF-16 represents.
> >> Any application can say anything it wants to *further restricting*
> >> what characters can apply where; the part we couldn't tolerate
> >> was if XML insisted upon strings that were *illegal* in the registered
> >> UTF-16, yet calling the charset "UTF-16".
> What I was saying is that if XML states that all valid XML documents
> must start with the BOM, that's no more problematic than if HTML
> states that all valid HTML documents must start with <!DOCTYPE;
> this is part of the application, not part of the charset.
>
> I'm not saying it's a good idea; I strongly suspect that it's not.
> But it does not need to have the consent of the charset registration.
What XML is currently stating is that all UTF-16 documents must start
with a BOM, and that this BOM is not part of the real XML document.
XML does not say anything about a BOM for UTF-8, but the whole text
(in particular http://www.w3.org/TR/REC-xml#charencoding) and
and the examples it gives (http://www.w3.org/TR/REC-xml#sec-guessing)
strongly suggest that such a thing was never even taken into any
kind of consideration (Makoto, please correct me if this is otherwise).
For all the other (legacy) encodings, putting in a BOM at the beginning
of the document wouldn't be impossible in theory (using ""),
but makes even less sense, and is definitely not required, nor would
it be considered correct XML.
Regards, Martin.