[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Thoughts about characters transmission
> Masataka Ohta writes:
>
> > For JIS, for example, Hirakana, Katakana and some frequently used
> > punctuations, at least, and some frequently used Japanese Hans (about
> > 1000, at most), optionaly, should be encoded with two octets.
>
> Is there an easy criterium to distinguish about 1000 characters
> (preferably based on their code point), or do you have to use usage
> statistics?
There is a list of Han characters to be educated in each grade of the
elementary schools in Japan compiled by the Ministry of Education.
grade # of characters cumulative percentage of use
1 80 21
2 160 43
3 200 61
4 200 73
5 185 84
6 181 89
The cumulative percentage is my private measurement on newspaper
articles.
I think other Han using countries should also have such lists.
Masataka Ohta