[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Registration of new charset [ISO-2022-JP-2004]



At 02:31 06/10/02, Erik van der Poel wrote:
>Hello,

Hello Erik,

Many thanks for your questions.

>I have a few questions about this registration:
>
>> At 00:28 06/09/28, Koichi Yasuoka wrote:
>> >=?ISO-2022-JP-2004?Q?=1B=24B0B2=2C9=270l=1B=28B?=
>
>I believe that, in general, many of us recommend being conservative in
>what you send out, liberal in what you accept. Therefore, the
>recommendation is to use the charset label that matches the smallest
>subset of characters actually used in the text, as well as using the
>oldest and/or most commonly accepted name. In this case, you are
>clearly using the ESC $ B (1B 24 42) that is part of iso-2022-jp (rfc
>1468). Therefore, the more conservative option is to use the name
>iso-2022-jp when sending this particular piece of text.

Yes indeed. In this specific case, I think Koichi didn't mean
to suggest that one necessarily should label such data as
iso-2022-jp-2004, but just used his name as an example to
answer my question on why B encoding was preferred to Q encoding.


>I have noticed over the years that if you don't spell out the
>recommendations, implementors will do the wrong thing. In this case,
>would it be a good idea to add such recommendations to the
>registration itself? Or should a new RFC be written, in order to
>provide the recommendations in more detail?

I agree that the registration should give some information
about how this new encoding relates to iso-2022-jp.


>> >I understand that ISO-2022-JP texts with "ESC $ B" and
>> >"ESC ( B" can be accepted by ISO-2022-JP-2004 decoder.
>> >It is problematic when "ESC $ @" or "ESC ( J" is used but
>> >they are very rare now.

On reconsideration, I'm not sure I can agree with this
statement of Koichi. What I found is that JIS X 0213 contains
a list of characters that are not supposed to be sent with
ESC ( B. This list, as far as I was able to check, includes
all the new additions to the code table, but it also contains
quite a few characters already in JIS X 0208 (the base for
iso-2022-jp). I haven't yet found something that says that
although these characters are not supposed to be sent, they
nevertheless have to be accepted. Therefore, I think the
above statement is doubtful.

>Which escape sequences are permitted in iso-2022-jp-2004?

- ESC ( B      for ISO/IEC 646 IRV
- ESC $ ( O    for the full plane 1 of JIS X 0213
- ESC $ ( P    for plane 2 of JIS X 0213
- ESC $ ( B    for a subset of plane 1 of JIS X 0213
               (also a subset of the plane/table from JIS X 0208)

>There are 3
>problems with the link you sent earlier*: The first page is in
>Japanese, and when you search for X0213, the results are in Japanese
>too. Then X0213 is split into many PDFs, and it is not clear which one
>to download in order to see the escape sequences, nor am I inclined to
>download all of the pieces.

Giving URIs to specific parts of the documents, with explanations,
would certainly be appreciated.

>Finally, that site was down yesterday and
>up today. How often does it go down?
>
>* http://www.jisc.go.jp/app/JPS/JPSO0020.html
>
>> >Now I know "ISO-2022-JP-2004 vs Unicode
>> >mapping table" at http://x0213.org/codetable/iso-2022-jp-2004-std.txt
>
>I wonder whether either or both of these links would be good to have
>in the registration:
>
>http://www.itscj.ipsj.or.jp/ISO-IR/233.pdf
>http://www.itscj.ipsj.or.jp/ISO-IR/

The first one certainly would be good to have.
The second one is too general.

Regards,     Martin. 

>Erik van der Poel
>Editor and co-author of RFC 1468 (iso-2022-jp)


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp