[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for Addition of New Alias Names to ExistingIanaRegisteredCharacter Sets - REPOSTING REQUEST



Hello Uma,

I agree with your point that there is no requirement for software
to support any aliases (except maybe for the 'preferred MIME' ones),
and there is no requirement for software to support any particular
charsets (except e.g. US-ASCII for email, and UTF-8 and UTF-16
for XML).

However, some of the text of your proposal at least at first sight
seems to suggest otherwise. In particular
 >>>>
We are requesting the addition of these aliases, because the use of names
that includes the dash is prevalent, e.g., a number of software tools
support the dash spelling. As a result, we have a body of XML documents
that include encoding declarations for the dash variants, without XML
processor support for these documents.
 >>>>

seems to suggest that XML processors will accept these documents
just by registering the aliases in the list. This is of course
totally wrong. Fixing the producing software, and the already
produced documents, is probably a better way to get these
documents accepted in XML processors. I would very much like
to see a rewritten proposal that made it absolutely clear
that this is not the case.


Another problem with your text above is the wording "because the use of
names that includes the dash is prevalent, e.g., a number of software
tools support the dash spelling.". Prevalent seems to suggest dominance
or at least widespreadness, where as 'a certain number' seems to
suggest that it might be just a few. Which one is it actually?
Getting some information on this, as well as on similar questions
from Paul, should help to move the discussion forward.


In addition, you write:

 > Here are some examples of the use of the dash aliases:
 > Sun Solaris iconv:
 > 
http://www.sun.com/developers/gadc/technicalpublications/whitepapers/solunic 
osuppt.pdf?redirect=false

I didn't find this. I got redirected to
http://developers.sun.com/techtopics/global/. Can you provide
a better reference?

Anyway, this is


 > See page 32
 >
 > IBM AIX iconv:
 > 
http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/genprogc/ge 
nprogc02.htm
 >
 > See Appendix A, IBM Code Sets

The TOC for this reads:

Appendix A. Character Maps
ISO Code Sets
  ISO8859-1
  ISO8859-2
  ISO8859-5
  ISO8859-6
  ISO8859-7
  ISO8859-8
  ISO8859-9
  ISO8859-15
IBM Code Sets
  IBM-850
  IBM-856
  IBM-921
  IBM-922
  IBM-1046
  IBM-1124
  IBM-1129
  TIS-620

The ISO ones should all start with "iso-8859", with an additional hyphen.
Should we register the above aliases, too? Where should we draw the line,
and how? Or should we just continue to add and add and add?


 > IBM DB2 XML:
 > http://www.ibm.com/software/data/db2/os390/pdf2/v7/dxxxa710.pdf
 > See Appendix C, table 64, page 255

This lists the following:

EBCDIC

ibm-037 OS/390 and z/OS 37
ibm-273 OS/390 and z/OS 273
ibm-277 OS/390 and z/OS 277
ibm-278 OS/390 and z/OS 278
ibm-280 OS/390 and z/OS 280
ibm-284 OS/390 and z/OS 284
ibm-297 OS/390 and z/OS 297
ibm-500 OS/390 and z/OS 500
ibm-1047 OS/390 and z/OS 1047
ibm-1140 OS/390 and z/OS 1140

 > OS/390 reference:
 > http://publibz.boulder.ibm.com:80/cgi-bin/bookmgr_OS390/BOOKS/CBCPG120/7.6.4
 >
 > See section 7.6.4 Code Set Converters Supplied, Table 76.



(more below)


At 13:10 04/01/22 -0500, Uma Umamaheswaran wrote:




>I am surprised at the responses posted.  I went back and rechecked all the
>postings on the previous go around on the posting in July-Sept 2002 time
>frame.  I do not accept that the disposition in the previous go around was
>conclusive.
>
>Statments that it is bad idea, it is unnecessary and possibly harmful ..
>have been made.  There were also opposition to aliases for open use
>charsets such as 8859-1.   Not so strong against the limited use charsets.

Yes, adding additional aliases for US-ASCII and ISO-8859-1
(the last two in your proposal, listed separately) is clearly
a very bad idea. Using anything but the above mime-preferred
tags (even the already registered aliases) is probably better
called a bug than anything else.

So I would suggest that you remove these two new aliases from
the proposal.

Also, can you confirm that all the others in the list are
EBCDIC-based? Or otherwise split the table, and give more
information?


>No - there are no new additions in the list of proposed aliases.

ok, good.


>I had also raised questions about what is the basis for such claims based
>on the charsets registration procedure or registry at that time.
>
>In the recent go around:
>----------------------
>Paul Hoffman responded:
>"It *did* get somewhere: it got heavily discussed. The gist of the
>discussion was that the request was both unnecessary and possibly harmful
>to the established base of software because that base would need to be
>updated with the new aliases."
>
>My response --  we will not be making the request if it was not considered
>necessary by the requesting community of developers (on behalf of the
>users).

Have these developers considered fixing the software that produces
these aliases? Have these developers and users considered that adding
these aliases will not actually in any way help acceptance of the
already existing documents, because software doesn't upgrade itself?


>If I thought the disposition got to its end -- I would not have
>stated that 'it got nowhere'.   "Possibly Harmful to established software
>.. " -  I claim that the above statement is certainly not based on the
>current set of Registration procedures and the purpose of the registry
>itself.  I have quoted the relevant sections from the procedure RFC and the
>registry below and how I interpet the statements therein.
>
>Ned Freed had responded:
>"The last time this request was posted there was considerably pushback
>saying that adding these additional aliases was a bad idea. I continue to
>believe this is the case, and I am therefore opposed to making this
>change."
>
>My response:  the last time also I asked for the rationale .. It is based
>on the wrong premise that everyone has to implement all the charset ids and
>their aliases from the registry.  It goes against what I have indicated in
>the following paragraphs as to how I read the purpose of the registration
>and the registry itself.
>
>I was looking for 'any potential basis' for statements such as what Paul
>and Ned have made in the current published set of procedure for Charset
>Registration and in the Registry itself.  The following are some relevant
>statements in the current procedure RFC 2978 and in the published registry
>- and I have attached how I read these statements in context of the current
>proposal and the comments that have been made against the proposal.
>
>================ From the Character Set Registry .. ===========
>================= http://www.iana.org/assignments/character-sets
>===============
>
>The very first sentence reads:
>"These are the official names for character sets that may be used in the
>Internet and may be referred to in Internet documentation."
>
>My interpretation of the above:  There is no requirement on any internet or
>other software that they must support every character set registered in
>this registry.  Nor is there a requirement that they must support every
>alias of any charset that is registered herein.   I cannot see how having
>aliases or adding more aliases will cause harm to any existing software -
>certainly not based on what is stated in the registry itself.
>
>====== From:     RFC 2978 -   IANA Charset Registration Procedures
>=========== ftp://ftp.rfc-editor.org/in-notes/rfc2978.txt
>
>(From Abstract section):
>"Note: The charset registration procedure exists solely to associate a
>specific name or names with a given charset and to give an indication of
>whether or not a given charset can be used in MIME text objects.  In
>particular, the general applicability and appropriateness of a given
>registered charset to a particular application is a protocol issue, not a
>registration issue, and is not dealt with by this registration procedure."
>
>My interpretation:  The second sentence in the above note is a strong
>evidence for me that comments on the proposal that the request is harmful
>etc. is not based on this set of procedures.  It is some protocol that may
>say what to use or what not to use .. not the registration itself.   Such
>consideration is NOT a registration issue either.
>
> >From section 2.3.  Naming Requirements
>
>"One or more names MUST be assigned to all registered charsets. Multiple
>names for the same charset are permitted, but if multiple names are
>assigned a single primary name for the charset MUST be identified. All
>other names are considered to be aliases for the primary name and use of
>the primary name is preferred over use of any of the aliases."
>
>My interpretation:  Identifying more existing aliases is not against
>anything that is stated here.  The preferred name will be always the
>Assigned Name.  I dont read that above as having any or more than one alias
>recorded in the registry is somehow harmful.  On the other had, I think
>recording  aliases that may be encountered is more informative than not
>recording them.

Well, to some extent, you are right. But if we really want to be
formal and just follow the registration procedure, we could just
say that it does not provide for additions of new aliases to
existing registrations. And that would be it.

But I think there is a somewhat more fundamental issue here:
Let's assume we just add these aliases. Then somebody comes
and says "Hello, sorry guys, I have software that produces
documents encoded in various versions of ISO 8859. Just
made a little typo when the software got out, it actually
writes iso-859-1, iso-859-2,... Just so that I can claim
that my software doesn't contain a bug, can you please add
these as aliases?".

We clearly don't want to add aliases for every bug out there,
better that bugs would actually be fixed. So what should be
the criteria between reasonable requests and others?

Also, with respect to XML, developers claiming to produce
XML must be aware that they have to be very careful, because
XML is very carefully and exactly defined. Anything they
produce that does not meet the XML grammar is just not XML.
There is no such thing as 'be liberal in what you accept'
with XML. I'm not sure I can find a good reason of why there
should be an exception for charsets.


> >From section 2.5.  Usage and Implementation Requirements
>
>"Use of a large number of charsets in a given protocol may hamper
>interoperability.  However, the use of a large number of undocumented
>and/or unlabeled charsets hampers interoperability even more."
>
>My interpretation:  The second sentence is more of a strong argument to
>open up the registry for more things than being restrictive.    The claim
>of 'harm', 'unnecessary' etc. are certainly not defendable based on the
>above paras in the registration procedure document.
>
>"A charset should therefore be registered ONLY if it adds significant
>functionality that is valuable to a large community, OR if it documents
>existing practice in a large community.  Note that charsets registered for
>the second reason should be explicitly marked as being of limited or
>specialized use and should only be used in Internet messages with prior
>bilateral agreement."
>
>My interpretation:  The request was to document existing practice in
>products supporting a large community of users of IBM  systems and non-IBM
>systems interfacing with these, using Open Standard protocols /
>specifications that call for use of charsets from the IANA charsets
>registry.

Please note that all the texts above apply to the addition of
charsets, not of aliases.


> >From section 2.6.  Publication Requirements
>
>"The registration of a charset does not imply endorsement, approval, or
>recommendation by the IANA, IESG, or IETF, or even certification that the
>specification is adequate. "
>
>My interpretation: The above statement seems to be saying that this
>registry is merely a record of what is out there.  There is no expectation
>nor a requirement that any of the charsets or their aliases are implemented
>by every component attached to the internet.  it is a recrod of -- when you
>encounter one of these charset labels where you can get  more information
>about the definition behind that label.

Yes, but there are fundamental differences between charsets and aliases:
- Although we hope to make progress towards fewer charsets, we know that
   we have to deal with the fact that there are different charsets for
   quite some years from now. Indeed, this is the purpose of the charset
   registry. On the other hand, we also understand perfectly well that
   for each charset, only one label is enough. Additional aliases don't
   add any functionality at all.
- It takes a certain amount of work to create an additional charset,
   and this is usually only done with some actual purpose, to overcome
   some (at least perceived) deficiency. Therefore, once a charset is
   created, registering it seems acceptable, because not registering
   it does not seem to


I think you forgot to cite this section from RFC 2978:

 >>>>
3.2.  Charset Reviewer

    When the two week period has passed and the registration proposer is
    convinced that consensus has been achieved, the registration
    application should be submitted to IANA and the charset reviewer.
    The charset reviewer, who is appointed by the IETF Applications Area
    Director(s), either approves the request for registration or rejects
    it.  Rejection may occur because of significant objections raised on
    the list or objections raised externally.  If the charset reviewer
    considers the registration sufficiently important and controversial,
    a last call for comments may be issued to the full IETF.  The charset
    reviewer may also recommend standards track processing (before or
    after registration) when that appears appropriate and the level of
    specification of the charset is adequate.

    The charset reviewer must reach a decision and post it to the ietf-
    charsets mailing list within two weeks.  Decisions made by the
    reviewer may be appealed to the IESG.
 >>>>

Are you actually convinced that consensus has been achieved for adding
these aliases, as the above text requires?



>-------------------------------
>
>Just in case some of the rationale for the request is not clear from the
>earlier set of discussions in July/Sept 2002.
>
>IBM has a large set of character encodings registered in its corporate
>registry with numbers being assigned to them.  Most of these are IBM
>defined -- however, non-IBM sets are also given a number within this
>registration system.  When literal strings are needed as charset labels,
>often IBM- is added to the number to get IBM-xxxxx as the literal string
>label.  These are used to identify the charsets associated with data in
>database, in identifying the converters to be invoked etc.  and of  course
>using XML as well.  XML has recommended that charsets are registered with
>IANA registry.

Yes. Otherwise, there is no clue about what
    <?xml version='1.0' encoding='foo' ?>
is supposed to stand for.


>In the set that is in the proposal all of them are Aliases
>for existing charsets with assigned names.  Others such as IBM-1047 have
>been dealt with separately.  The proposal document has some references
>showing where these IBM-xxxxx ARE used.
>
>It is a matter for the protocols such as XML and other Internet protocols
>to permit, reject, restrict be open etc. about any of the labels that are
>registered in the IANA character set registry.  Having something in the
>registry -- I cannot see being HARMFUL to any piece of software / internet
>component out there.

Adding another single alias is definitely not that harmful, in particular
if it is clearly understood that this does not mean that software suddenly
changes just by adding this alias, or even that software would somehow
be required to change.

But just saying yes to every alias coming in clearly cannot be a
solution either.

Regards,   Martin.



>On the other hand, having the information in the
>registry is more useful, in case any one's current software chooses to
>enhance itself to recognize some of that data.  Otherwise it will remain as
>another uncrecognized label.
>
>Another factor that is driving this request is also the specifcation that
>'identity matching of the charset labels (ignoring case)'  is required.
>
>-----------------
>
>Best regards,  Uma.
>V.S. UMAmaheswaran, Ph.D.
>Globalization Centre of Competency, IBM Toronto Lab
>A2/979, 8200 Warden Avenue, Markham, ON, Canada, L6G1C7; +1 905 413 3474;
>Fax:905 413 4682; TieLine 969; email: umavs@ca.ibm.com