[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character Encoding Denotation



IETF-CharSets People,

I am Cc'ing you since you may be able to help us (the HTML WG).
I would like to start a discussion about Shift-JIS and EUC-JP.


Thanks in advance for any assistance and/or comments,

Erik van der Poel



Larry Masinter writes:
> In order to register text/html, we also need someone to fill out the
> 'media type registration form'. (Getting the HTML spec RFC out wasn't
> enough, I don't think.)

I think you're right.


> text/* media types are allowed to constrain the allowable charset
> values they admit to a smaller set than 'anything that is registered',
> as from RFC1521:
> 
> >    The specification for any future subtypes of "text" must specify
> >   whether or not they will also utilize a "charset" parameter, and may
> >    possibly restrict its values as well.
> 
> and later..
> 
> >  An initial list of predefined character set names can be found at the
> >  end of this section.  Additional character sets may be registered
> >  with IANA, although the standardization of their use requires the
> >  usual IESG [RFC-1340] review and approval.
> 
> I suggest that the I18N draft might actually enumerate which spellings
> of which of
> 
> 	ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
> 
> are allowed for text/html.

"Allowed" is too strong.  I think the spec should simply recommend the
use of UTF-8 and UCS-2, and say that others are not discouraged but that
it is in the interests of interoperability to avoid proliferation of
charsets.  Then you could enumerate the recommended names of some of
the popular charsets (e.g. us-ascii, iso-8859-1, iso-2022-jp), and
recommend that these names be used for sending, but that a receiver
should be prepared to accept any of the aliases for each.


> The goal isn't so much to eliminate
> anyone's favorite character sets, but to reduce the variations on
> spellings of the names. In particular, the list should include
> whatever charsets are currently being used.
> 
> I don't know if new aliases are allowed by IANA for old charset names.
> Does anyone want to give a try for shift-jis and euc-jp? I agree that
> the registered names are ugly, but the registration draft doesn't say
> anything about defining new aliases.

I tried to register these aliases a while ago but they ignored me,
possibly because I didn't discuss them on the

  ietf-charsets@innosoft.com

list first.  I'm Cc'ing this list.  See:

  ftp://ds.internic.net/internet-drafts/draft-ietf-822ext-mime-reg-01.txt

So, IETF-CharSets folks, what do you think?  Wouldn't it be nice if 
we could have a nice, short alias for EUC-JP?  Namely, "EUC-JP"?

Currently, the official name appears to be:

  Extended_UNIX_Code_Packed_Format_for_Japanese

This is rather long, and it seems like we should try to find a shorter
name.  The official name for Shift-JIS is "Shift_JIS", so this seems
reasonable.


Erik