[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registration of new charset [ISO-2022-JP-2004]



Hello Koichi,

At 13:44 06/08/16, Koichi Yasuoka wrote:
>Dear Sirs,
>
>One month has passed after the proposal shown below, and I've
>heard no objection against the charset.  Then, regarding RFC 2978,
>how do I contact the "charset reviewer"?

The former charset reviewer has resigned.
The IETF Applications Area Directors are currently working on
finding and appointing a new reviewer.

When I saw your proposal, I had a few comments, but unfortunately,
I didn't find the time to put them together. Please find them
below.


>Best Regards,
>Koichi Yasuoka
>
>------ Registration proposal on 17 Jul 2006
>
>Charset name:
>
>ISO-2022-JP-2004
>
>Charset aliases:
>
>ISO-2022-JP-2003
>ISO-2022-JP-3-2003
>
>Suitability for use in MIME text:
>
>Suitable for 7-bit use in MIME body-part as text/plain or text/html.
>B-encoding is recommended for use in MIME header-part, because
>ISO-2022-JP-2004 is a partial extension of ISO-2022-JP.

The fact that the new encoding can be seen as an extension of
iso-2022-jp doesn't make it easy for people to understand
why the B-encoding is recommended.

RFC 1468 also just says:

   ISO-2022-JP may also be used in MIME Part 2 headers.  The "B"
   encoding should be used with ISO-2022-JP text.

and thus doesn't motivate or explain anything. Is the preference
for "B" due to tradition? Or because on average, it leads to
shorter encodings? Or because even if "Q" may be shorter in
some (many?) cases, the literally displayed US-ASCII codepoints
will just confuse somebody who looks at it? Or are there
implementations that only understand "B"?

Also, I think that any mention of "extension of ISO-2022-JP"
without explanations is a bit problematic, because it might
give the impression that implementations accepting iso-2022-jp
also will somehow work for this new encoding. In my understanding,
because new escape sequences are used, it is extremely difficult
to predict what might happen in such a case.


>Published specification:
>
>JIS X 0213 7-bit and 8-bit double byte coded extended KANJI sets for
>information interchange, Japanese Standards Association (first edition
>2000-01-20, amendment 2004-02-20, corrigendum 2004-04-01).
>
>ISO 10646 equivalency table:
>
>No direct URI to the equivalency table, but the table is included in
>JIS X 0213, which can be found via
> http://www.jisc.go.jp/app/JPS/JPSO0020.html
>with searching the word "X0213".

Oh, this is really nice: A JIS standard is available in .pdf
on the Web (although it can't be printed out). Given that the paper
copy is 11,000 Yen + tax (at least mine was, in 2000), that's a very
nice development.

The main problem I see, not for me personally, but for others, is the
fact that everything on this site as well as the standard itself is
in Japanese. Also, for programmers, even if they read Japanese,
typing in the data from a screen (or buying the standard
and typing the data in from there) looks like a really bad (because
extremely tedious and error-prone) idea.

So I think both a more detailled description and a pointer to
machine-readable data would be highly appreciated by anybody who
wants to implement this. Even inofficial pointers, and even if
only on the mailing list and not as part of the official
registration form, would be better than nothing.
As for description, I think it could be as easy as just listing
the various escape sequences and their meaning (roughly what's
in appendix 2, section 4 of (at least the 2000 version that I
have in front of me) of JIS X 0213.


>Additional information:
>
>"ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" were in the first print of
>JIS X 0213:2004 dated February 20, 2004, and they were both corrected
>to "ISO-2022-JP-2004" in the corrigendum dated April 1, 2004.  To avoid
>complications "ISO-2022-JP-2003" and "ISO-2022-JP-3-2003" may be aliases
>of "ISO-2022-JP-2004", but "ISO-2022-JP-2004" is preferred.

The 2000 version of JIS X 0213 also contains ISO-2022-JP-3.
What's the reason for leaving that out of the registration?
Is the reason that there were changes in the Unicode mappings
of JIS X 0213?

[for outsiders: The 2000 version contained some
characters that were not yet in Unicode/ISO 10646, and listed
the 'desired' Unicode/ISO 10646 codepoints, but the actually
allocated codepoints in Unicode/ISO 10646 were different.]

Are there any changes in Unicode/ISO 10646 mappings between
2003 and 2004? If yes, what? If not, what are the chances that
motivated yet another alias (already the two original aliases
are in principle one more than necessary).


>Person & email address to contact for further information:
>
>Koichi Yasuoka
>yasuoka@kanji.zinbun.kyoto-u.ac.jp

Thanks for taking on the job of registering this encoding!


>Intended usage:
>
>COMMON

How common is this already, or is it going to be?
What I have heard is that most implementers are using, or
plan to use, UTF-8 or UTF-16 for implementing the repertoire
of JIS X 0213.


Another question: at least the 2000 version I have in front of
me also defines Shift_JISX0213 and EUC-JISX0213. Are there plans
to register these, too? (I seem to remember that there was a
request in that direction, but that failed because the requester
wasn't able to agree with the rest of the list on the meaning
of the "Suitability for use in MIME text:" field.)


Regards,     Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp