[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Charlint (aka charlie)
Dear IETF charset experts,
I'm glad to announce the availability of 'charlint' (aka 'charlie')
at http://www.w3.org/International/charlint/.
Future announcements (in particular upgrades and bug fixes)
will also be published on www-international@w3.org.
Charlint is a perl program that checks and corrects a stream of
UTF-8-encoded characters. In particular, it implements Normalization
Form C (Canonical Composition) according to Unicode TR #15,
in accordance with the Character Model for the World Wide Web
W3C WD (http://www.w3.org/TR/1999/WD-charmod-19990225).
The problem of duplicate representations (precomposed and
decomposed) has come up in various IETF WGs, but was never
important enough to be discussed through. If the IETF could,
after due examination, follow the W3C and the Unicode Consortium,
I guess that would be helpful for everybody.
If you have any data in UTF-8 (or other Unicode-based encoding,
which can easily be converted to UTF-8), please test it out.
In particular, if you have any data where precomposed/decomposed
issues are relevant, please make sure that you test it soon,
so that we get feedback on the Character model, on the Unicode
V3.0 data, and of course on the software itself.
Also, if you have any test data with interesting test cases
that you could contribute, or any patches or ideas, please
send them in. For the rest, please see the page mentioned above.
Regards, Martin.
#-#-# Martin J. Du"rst, World Wide Web Consortium
#-#-# mailto:duerst@w3.org http://www.w3.org