[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Widetext (was Re: Registration of new charset "UTF-16")
Although I agree that I18N of e-mail should begin with UTF-8,
I believe that UTF-16 provides the future of the WWW (XML, HTML, and
HTTP).
UTF-8 XML documents parse incorrectly very often. If the charset
parameter of text/xml is absent or incorrect, a UTF-8 XML document
is likely to parse incorrectly; XML parsers do not always find the charset
incorrect. Thus, corrupted data will be stored in database. WWW
agents will receive and return corrupted data or even completely
fail. On the contrary, UTF-16 is exempt from such data corruption;
because of the BOM and a bunch of 00, UTF-16 XML will either parse
correctly or do not parse at all. Furthermore, error recovery is very
reliable.
I think that UTF-8 provides a good migration path from ASCII-only
and that UTF-16 provides a very good start for new protocols or data
formats. In my opinion, HTTP people did a very good job in lifting
unnecessary restrictions of text/*. I hope that future protocols will
do the same thing.
Makoto
Fuji Xerox Information Systems
Tel: +81-44-812-7230 Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp