[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Metaquestion on this group



> > So, based on the fact that MIME charset is primarily for text/plain, with
> > which we have collected some experience, shouldn't we have a specific
> > objective of internationalized "encoding of plain text".
> 
> Unfortunately, we don't seem to have consensus in the community that
> multilingual messages -- in MIME terms, single body parts containing
> text in more than one language -- are "plain text".

Your opinion is totally unexpected and has surprised me, completely.

Are there anyone else who suggest multilingual text might not be
"plain"?

We, Japanese, have been using mixed Japanese-English text daily for
quite a long time.

All the text processing tools of UNIX are modified with great effort
and, now, can handle mixed Japanese-English text.

We do want to do grep on multilingual text.

JIS X 0208, developed in Japan, is considered to encode plain text
including basic Latin, Greek and Cyrillic alphabets. In JIS X 0212,
further alphabets are included.

EUC, developed in Japan, is considered to be a scheme to handle
limited-multilingual plain text. UNIX tools are modified for EUC
and can handle bi-, tri- or more-lingual text.

So, if someone says it is not plain, we are quite confused.

It is not ununderstandale that native users of English tend to think
that English (more correctly speaking, basic Latin, Greek and Cyrillic
alphabets) is special, that is, English is the only language which
may be mixed with other languages and still the entire bi-lingual text
is "plain". But we have no reason to think so.

> If they *are* "plain text", then one needs switching mechanisms, not
> just "character sets" (in the "code table" sense).

No, a large code table is equivalant to switched small tables.
Unfortunately, 16 bit space is not sufficiently large.

We, anyway, need switching mechanism to use JIS X 0208 and JIS X 0212
to encode some Japanese text with 7 bit encoding.

So, switching is not essential here. It's merely an encoding issue.

						Masataka Ohta