[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Widetext (was Re: Registration of new charset "UTF-16")



Although I agree that I18N of e-mail should begin with UTF-8, 
I believe that UTF-16 provides the future of the WWW (XML, HTML, and 
HTTP). 

UTF-8 XML documents parse incorrectly very often.  If the charset 
parameter of text/xml is absent or incorrect, a UTF-8 XML document 
is likely to parse incorrectly;  XML parsers do not always find the charset 
incorrect.  Thus, corrupted data will be stored in database.  WWW 
agents will receive and return corrupted data or even completely 
fail.  On the contrary, UTF-16 is exempt from such data corruption; 
because of the BOM and a bunch of 00, UTF-16 XML will either parse 
correctly or do not parse at all.  Furthermore, error recovery is very 
reliable.

I think that UTF-8 provides a good migration path from  ASCII-only 
and that UTF-16 provides a very good start for new protocols or data 
formats.  In my opinion, HTTP people did a very good job in lifting 
unnecessary restrictions of text/*.  I hope that future protocols will 
do the same thing.

Makoto
 
Fuji Xerox Information Systems
 
Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp