[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: A spec for showing language in MIME headers
Ohta-san writes:
> As the CJK disambiguation is necessary word-by-word (don't forget that
> Harald proposes to handle multi-lingual document) and in header part,
> and the disambiguation is necessary only for the specific character
> set: ISO10646/UNICODE, language tag is not a good mechanism for the
> disambiguation. It's better to use ISO10646/UNICODE with the
> charset names "iso-10646-<language tag>" for single language only.
I fail to see why
Content-Type: Text/Plain; charset=iso-10646-chinese
would solve the problem of word-by-word distinction between
Chinese and Japanese in a multi-lingual text any better than
Content-Type: Text/Plain; charset=iso-10646
Content-Language: zh (Chinese)
Neither of them does, I think, and the latter approach seems
cleaner to me, as it doesn't confuse language with coded
character set.
We can't expect _plain text_ to support
a) high-quality rendering of text mixing Chinese and Japanese
ideographic characters
no more than we expect plain text to preserve the distinctions
between
b) normal and italicized text
c) a Black-letter or Fraktur font for German words and a Roman
font for French words in a bilingual pre-WW2 text
d) the correct glyph for the character A with diaeresis in a
Swedish word and the correct glyph for the same character in
a German word
e) the correct choice between "ff" and the <ff> ligature in
different German words.
All these needs a - e can be easily satisfied, however, by a
suitable _rich text_ representation. This could e.g. provide not
only <italic> and </italic>, but also such tags as
<fraktur>...</fraktur>, <lang-zh>...</lang-zh>,
<ligate>ff</ligate>.
Why should properties like these not be encoded on the basic
plain text level? Because they are not necessary for conveying
the _meaning_ of the text to a human reader (except in extreme
cases).
/Olle
- Follow-Ups:
- Plain text
- From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>