[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

summary of ucs-bof in the last IETF



Before proceeding to the detailed discussion, I would like
to clarify our goals and current issues.

First of all, I made a summary of the discussion of ucs-bof in the last
IETF at Amsterdam. Any comments or corrections?

1) Many existing protocols are evaluated how they adopt to the
extended character sets.

2) Whether
	we should extend all the protocols so that they can negotiate
	or announce the character set used
or
	we should provide a single universal encoding of text
is the first issue. No one said the former is better and the discussion
continued on how to implement the latter only.

3) Assuming ISO 10646, whether we should use
	16bit byte
or
	UTF style encoding
is discussed. With brief explanation on the issue that
	"16bit byte" is incompatible with the current ASCII files
	and ASCII based protocols
and that
	"16bit byte" is an obstacle to 32bitness though ISO 10646
	is now being extended beyond 16 bit
no one said "16bit byte" anymore (though some might still silently
think "16bit byte" is the way to go).

4) If we are to have a single universal text encoding, the encoding should
be good enough in every respect. Several requirements for the encoding
was presented by me.
	Plain Text Processing
		We should focus on the processing of plain text.
	Universality
		The encoding must be able to restore the original content
		of a encoded plain text without any negotiation nor
		profiling. This requirement is already stated in 2).
	Causality
		Because of the law of causality, decoding process can
		not depend on a not-yet-happened event. Thus, for an
		interactive processing, as immediate output is required,
		a shape of a character can not depend on the
		possibly-not-yet-typed next character.
	Finitestateness
		The decoding process might be controlled by a stateful
		automaton. But, as long as the plain text processing
		concerns, the state transition should be represented
		with a finite state automaton.
	Finite Resynchronizability
		Even if the state of the finite state automaton become
		unknown, the resynchronization of the state should be
		possible by reading fixed finite number of bytes.
	Equality
		Equality of two text should be defined unambiguously, of
		course.
	ASCII compatibility
		The encoding should be ASCII compatible so that no
		conversion of files nor no modification to protocols
		necessary.
At the bof, there was no objection to any of the requirements.

5) It is agreed that for major European characters, ISO 10646 level 1
with UTF2 satisfies all the above requirements but ISO 10646 does not
satisfy any (save ASCII compatibility) of the requirement if several
other languages are taken into consideration.

6) A 21 bit encoding, ICODE, and its external representation, IUTF, was
presented by me as an extension to ISO 10646 and UTF2, which satisfies
all the requirement in 4) and also supports bidirectionality.

7) During the discussion on ICODE, it was pointed out that ICODE
do use the non private code point of UCS4. So, ICODE was slightly
modified to also have an explicite representation as UCS4 (not necessarily
equal to ICODE) which use the private use zone for the extension,
so that ICODE is now completely compatible with ISO 10646. So, characters in
ICODE now have three representations:
	UCS4
	ICODE
	IUTF
Even if the private area is moved by the ISO in the future (as suggested
by John), it is only necessary to change the mapping from ICODE to UCS4
which eventually does not affect any ICODE program because no one will
use UCS4 representation.

8) Someone (Harald, I think) said 32bit universal encoding is the
ultimate goal and showed several pathes to the goal.

9) It was agreed that the ultimate goal  shouldn't break ISO 10646.

10) It was agreed that the number of code conversion should be minimized.

11) Borka said it is also important to have conversion method of
characters so that many Euro characters are visible on, say, ASCII only
characters. There was no objection.

						Masataka Ohta

PS

Those who have not attended the bof might think that the summary contains
too much presentation and opinion of me. But, it actually took large
amount of the time of the entire BOF.