[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

XML Syntax WG position on UTF-16



The XML Syntax WG of the W3C has considered the question of the
registration of MIME charset tags for the UTF-16 encoding and its impacts
on XML.  Our position is as follows:

1) Registration of tag(s) for UTF-16 is very important for XML.  It should
occur as soon as possible.

2) The XML 1.0 spec requires a Byte Order Mark (BOM) on all entities
encoded in UTF-16, regardless of the tag that may be used to label those
entities.

3) Consequently, if one or more tags are defined such that BOMs are
forbidden, these tags will not be applicable to XML entities.  The
XML-syntax WG does not consider that to be a major problem, as long as at
least one tag is available to denote UTF-16 encoding and allowing the BOM
that XML needs.

4) The latest Internet Draft for UTF-16 states that the BOM must not be
touched during MIME-related operations.  That is, the BOM is part of the
MIME body.  Since XML can legally impose any constraints on XML MIME bodies
(e.g. tags must begin with '<'), we believe that there are no layer
violations even if XML mandates the BOM.

Regards,


Francois Yergeau and MURATA Makoto on behalf of the W3C XML Syntax WG