[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Volunteer needed to serve as IANA charset reviewer



On Sep 6, 2006, at 2:45 PM, Keith Moore wrote:

> As for utf-8 vs. Unicode, this is a bit tricky.  I agree that merely
> specifying Unicode isn't sufficient given the potential for
> incompatible CESs.  And yet I'm sympathetic to the notion that UTF-8
> pessimizes storage and transmission of text written in certain
> languages.  IMHO it's unreasonable to exclude the potential for a
> Unicode based CES that has more-or-less equivalent information
> density across a wide variety of languages.  But I do think that  
> use of
> multiple CESs in a new protocol should require substantial
> justification, and that UTF-8 should be presumed to be the CES of
> choice for any new protocol that requires ASCII compatibility for its
> character representation.

Agreed on all counts.  Section 5.1 of RFC3470 (aka BCP70) says smart  
things about this, referencing 2277.  Basically, if you're going to  
use XML, there's probably no point trying to legislate against UTF-16  
since any conformant reader is required to accept it, and in practice  
all known XML software can handle 8859 and Shift-JIS and EUC.   But  
if you're not doing XML, compulsory UTF-8 removes a lot of failure  
points without costing much.

   -Tim