[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

UTF-16 (was: Re: Charset reviewer appointed)

To: Harald Tveit Alvestrand <[email protected]>
Subject: UTF-16 (was: Re: Charset reviewer appointed)
From: "Martin J. Duerst" <[email protected]>
Date: Wed, 29 Jul 1998 18:20:00 +0900
Cc: [email protected], Multiple Recipients of Unicore <[email protected]>,[email protected], [email protected]
In-reply-to: <[email protected]>
References: <[email protected]><[email protected]><[email protected]> <[email protected]>

At 09:02 98/07/29 +0200, Harald Tveit Alvestrand wrote:

> >At 13:42 98/07/27 +0200, Harald Tveit Alvestrand wrote:
> >
> >> The BOM is part of the charset that UTF-16 represents.
> >> Any application can say anything it wants to *further restricting*
> >> what characters can apply where; the part we couldn't tolerate
> >> was if XML insisted upon strings that were *illegal* in the registered
> >> UTF-16, yet calling the charset "UTF-16".

> What I was saying is that if XML states that all valid XML documents
> must start with the BOM, that's no more problematic than if HTML
> states that all valid HTML documents must start with <!DOCTYPE;
> this is part of the application, not part of the charset.
> 
> I'm not saying it's a good idea; I strongly suspect that it's not.
> But it does not need to have the consent of the charset registration.

What XML is currently stating is that all UTF-16 documents must start
with a BOM, and that this BOM is not part of the real XML document.

XML does not say anything about a BOM for UTF-8, but the whole text
(in particular http://www.w3.org/TR/REC-xml#charencoding) and
and the examples it gives (http://www.w3.org/TR/REC-xml#sec-guessing)
strongly suggest that such a thing was never even taken into any
kind of consideration (Makoto, please correct me if this is otherwise).

For all the other (legacy) encodings, putting in a BOM at the beginning
of the document wouldn't be impossible in theory (using "&#xFEFF;"),
but makes even less sense, and is definitely not required, nor would
it be considered correct XML.

Regards,   Martin.

Follow-Ups:
- Re: UTF-16 (was: Re: Charset reviewer appointed)
  - From: Dan Kegel <[email protected]>

References:
- Re: Charset reviewer appointed
  - From: "Martin J. Duerst" <[email protected]>
- Re: Charset reviewer appointed
  - From: Harald Tveit Alvestrand <[email protected]>
- Re: Charset reviewer appointed
  - From: "Martin J. Duerst" <[email protected]>
- Re: Charset reviewer appointed
  - From: Harald Tveit Alvestrand <[email protected]>

Prev by Date: Re: Charset reviewer appointed
Next by Date: Re: UTF-16 (was: Re: Charset reviewer appointed)
Prev by thread: Re: Charset reviewer appointed
Next by thread: Re: UTF-16 (was: Re: Charset reviewer appointed)
Index(es):
- Date
- Thread