[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Thoughts about characters transmission



> Masataka Ohta writes:
> 
> > For JIS, for example, Hirakana, Katakana and some frequently used
> > punctuations, at least, and some frequently used Japanese Hans (about
> > 1000, at most), optionaly, should be encoded with two octets.
> 
> Is there an easy criterium to distinguish about 1000 characters
> (preferably based on their code point), or do you have to use usage
> statistics?

There is a list of Han characters to be educated in each grade of the
elementary schools in Japan compiled by the Ministry of Education.

	grade	# of characters		cumulative percentage of use
	1	80			21
	2	160			43
	3	200			61
	4	200			73
	5	185			84
	6	181			89

The cumulative percentage is my private measurement on newspaper
articles.

I think other Han using countries should also have such lists.

						Masataka Ohta