[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Charlint (aka charlie)



At 09:43 14.07.99 +0900, Martin J. Duerst wrote:

> > Unicode TR 15 refers to an Unicode character database for composition.
> > Is this database changing with every new amendment to the base standard,
> > or is it a stable reference that can be compiled into programs?
> >
> > If it is unstable, how does charlint deal with it?
>
>There are various aspects:
>
>- Decompositions for existing characters are supposed not to change.
>   TR 15 significantly increases the pressure to not change.
>
>- Decompositions for newly defined precomposed characters are dealt with
>   by keeping these characters decomposed in the normalized form. If
>   charlint meets a character it doesn't know (an undefined codepoint),
>   it can warn or abort.
>
>- Support for 'compiling' is currently not part of charlint, but it
>   is an idea for future work.

I was thinking of the composition database; I thought they were different,
from the text in TR15.
There's been some pressure to add new precomposed characters (W with
circumflex? Did this get in already?) to Unicode; if these are added to the 
decomposition database, it does not hurt the correctness of normalization
of existing text that uses the decomposed forms.

If they are added to the composition database, existing normalized text
will turn "incorrect". This is bad.

(This is not really relevant to charlint, but to the underlying issues;
IETF may have to take a stand on normalization Real Soon Now....)

                          Harald

(Favourite decomposition: The Arabic "character" that decomposes into
"In the name of Allah, the compassionate and merciful" - 21
characters, including 3 spaces....)

--
Harald Tveit Alvestrand, Maxware, Norway
Harald.Alvestrand@maxware.no