[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
For the record
Hello everybody,
In the charset policy BOF at the recent IETF meeting in Munich,
chaired by Harald Alvestrand, he showed a slide with variants
of Han characters (Kanji) that are unified in Unicode/ISO 10646,
but which may be problematic. He also showed this list in his
plenary talk presenting the planned IETF charset policy.
This list has been published on page 885 (explanatory page 7),
bottom, of JIS X 0221-1995, the Japanese translation of ISO
10646 (explanatory material not contained in the original),
and probably elsewhere.
In the BOF, I commented on this. I said that these were indeed
mostly character components that turned up in many characters,
and that a high percentage of them was explicitly unified by
the new version of the base Japanese Kanji standard,
JIS X 0208:1997. I mentionned a figure of something like 90
or 95%, which turns out to be too high if one counts cases,
but probably correct if one counts the characters affected
(see below).
To this, Masataka Ohta strongly protested, saying something
to the effect that he had been on the commitee developping
that standard. I have now had time to look at JIS X 208:1997
again. On page 399 (explanatory page 25), it lists the members
of the two commitees involved. On the following page, it gives
additional acknowledgements. Whatever that may mean, I have
not been able to find the name Masataka Ohta on these pages.
[my name turns up at the end of the text on page 400, as one
of the contributors to the public review done by the commitee,
in the form Duerst, Martin J.]
In the case that I have missed Masataka Ohta's name somewhere
in JIS X 208:1997, I would like him to give us the exact page,
and if necessary line number, to verify. In the case he has
indeed participated, but has for some reason be forgotten,
I ask the chair of both commitees listed on page 399, Prof.
Shibano, to tell us how Masataka Ohta has been involved.
Now for the list that Harald has shown. This list has 8 lines,
with four groups that each contain 2 or three variants.
For these, I give the item number of Section 6.6.3.2 of JIS
X 208:1997 (p. 12,...) which gives examlpes of unification,
and comments if necessary.
Note that JIS 208 also contains and lists exceptions, but
that these are carried over to Unicode/ISO 10646 as being
separated by the source separation rule.
Line 1
case 1 (3 variants) 128 (2 variants, third is
handwriting and not
covered by JIS 208)
case 2 (3 variants) 161 (2 variants, third is
the single-character
shape which is not listed
in JIS 208 section 6.6.3.2)
case 3 (3 variants) 153 (JIS 208 lists one more variant)
case 4 (3 variants) 155 (2 variants, middle is
the single-character
shape which is not listed
in JIS 208 section 6.6.3.2)
Line 2
case 1 (2 variants) 141
case 2 (2 variants) 147
case 3 (2 variants) 150
case 4 (2 variants) 70 (JIS generalizes to the lower part)
Line 3
case 1 (2 variants) 146
case 2 (2 variants) 98
case 3 (2 variants) 94
case 4 (2 variants) 144 (JIS limits this to the case
where this part appears
on the right)
Line 4
case 1 (3 variants) - (similar cases listed in 6.6.4)
case 2 (2 variants) 167 (JIS generalizes to the upper part)
case 3 (2 variants) 136 (JIS generalizes to the lower part)
case 4 (2 variants) 125
Line 5
case 1 (2 variants) 124 (JIS generalizes to the lower part)
case 2 (2 variants) 97
case 3 (3 variants) 96
case 4 (2 variants) -
Line 6
case 1 (2 variants) -
case 2 (2 variants) -
case 3 (2 variants) -
case 4 (3 variants) 48 (two right variants only in JIS)
Line 7
case 1 (3 variants) - (not a general case in JIS,
but several cases where this
is unified listed in 6.6.4)
case 2 (2 variants) -
case 3 (2 variants) -
case 4 (3 variants) 101
Line 8
case 1 (2 variants) 113
case 2 (2 variants) -
case 3 (2 variants) 80
case 4 (2 variants) 82
With all the comments, it's difficult to exactly say what percentage
this would amount to. But counting each case as one item, it's around
66%. If one counts characters affected, and not cases as such, however,
the percentage is much higher, because the cases with the most characters
(line 1: case 1, 2, 4; line 8: case 4) all are included in JIS 208.
With kind regards, Martin.
----
Dr.sc. Martin J. Du"rst ' , . p y f g c R l / =
Institut fu"r Informatik a o e U i D h T n S -
der Universita"t Zu"rich ; q j k x b m w v z
Winterthurerstrasse 190 (the Dvorak keyboard)
CH-8057 Zu"rich-Irchel NEW TEL: +41 1 63 543 16
S w i t z e r l a n d NEW FAX: +41 1 63 568 09 Email: mduerst@ifi.unizh.ch
----