[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: ISO 8859 -8:1999
Hello Martin,
Questionning the applicability of something that was argued only for Unicode
certainly makes sense, but I'm afraid I don't quite follow your reasoning.
In my mind, the argument never hinged on conversion (or not) to Unicode, it
was simply a recognition that having a new label for compatible upgrades of
charsets actually hurt more that it helped, because a non-updated receiver
that gets the new label has to fail completely, whereas if the old label is
kept, the same non-updated receiver only has to fail if it actually gets
some of the new characters it doesn't know about.
This seems to apply whether the charset in question is UTF-8 of ISO-8859-8,
unless one assumes that in the case of UTF-8, an application can accept
characters that it doesn't know about. Is that what you were implying?
--
Francois
> -----Message d'origine-----
> De?: Martin Duerst [mailto:duerst@w3.org]
> Envoye?: 5 decembre, 2001 00:27
> A?: Francois Yergeau; ietf-charsets@iana.org; Keld J ?n
> Simonsen; Jonathan Rosenne
> Objet?: RE: ISO 8859 -8:1999
>
>
> Hello Francois,
>
> I'm not exactly sure your argument with UTF-8 works.
>
> The problem becomes obvious if we assume that the UCS
> is the reference. In this case, additions to the UCS
> can always be made to work as good as the current
> implementation (because we don't expect conversion
> to legacy encodings to work, except for something
> like NCRs in HTML/XML). However, additions to legacy
> encodings will break interoperability if new data
> is sent to old implementations with the old label,
> in particular if the implementation converts to
> UCS internally (which more and more implementations
> do nowadays).
>
> Regards, Martin.
>
> At 14:15 01/12/02 -0500, Francois Yergeau wrote:
> >Keld J?n Simonsen wrote:
> > > Jonathan Rosenne wrote:
> > >> Justification: ISO_8859-8:1999 is a superset of
> ISO_8859-8:1988. Valid
> > >> ISO_8859-8:1988 data will still be valid under
> ISO_8859-8:1999. The new
> > >> characters were reserved in ISO_8859-8:1988. Registering
> ISO_8859-8:1999
> > >> as a separate character set would cause too much
> unnecessary confusion.
> > >
> > >As the two charsets are not exactly equivalent, they should not be
> > >aliases.
> >
> >Not so fast, please. This question was debated at length,
> on this very
> >list, when RFC 2279 was under scrutiny. The result was that
> the label
> >"UTF-8" was determined to be appropriate for all version of
> Unicode after
> >1.1, provided no incompatible change occurs, which is the
> same as saying
> >that subsequent versions have a superset relationship to
> previous ones. The
> >arguments are in the RFC itself, section 5:
> >
> > It is noteworthy that the label "UTF-8" does not contain
> a version
> > identification, referring generically to ISO/IEC 10646. This is
> > intentional, the rationale being as follows:
> >
> > A MIME charset label is designed to give just the
> information needed
> > to interpret a sequence of bytes received on the wire
> into a sequence
> > of characters, nothing more (see RFC 2045, section 2.2,
> in [MIME]).
> > As long as a character set standard does not change incompatibly,
> > version numbers serve no purpose, because one gains nothing by
> > learning from the tag that newly assigned characters may
> be received
> > that one doesn't know about. The tag itself doesn't
> teach anything
> > about the new characters, which are going to be received anyway.
> >
> > Hence, as long as the standards evolve compatibly, the apparent
> > advantage of having labels that identify the versions is
> only that,
> > apparent. But there is a disadvantage to such version-dependent
> > labels: when an older application receives data accompanied by a
> > newer, unknown label, it may fail to recognize the label and be
> > completely unable to deal with the data, whereas a generic, known
> > label would have triggered mostly correct processing of the data,
> > which may well not contain any new characters.
> >
> > ["Korean mess" paragraph elided]
> >
> > In practice, then, a version-independent label is
> warranted, provided
> > the label is understood to refer to all versions after
> Amendment 5,
> > and provided no incompatible change actually occurs. Should
> > incompatible changes occur in a later version of ISO/IEC
> 10646, the
> > MIME charset label defined here will stay aligned with
> the previous
> > version until and unless the IETF specifically decides otherwise.
> >
> >
> >I think the same argument can apply to any other charset that evolves
> >compatibly, i.e. later versions are strict supersets of
> earlier ones and
> >nothing else creates incompatibility. Jonathan tells us
> that this is the
> >case with ISO_8859-8:1988 and :1999.
> >
> >This does not prevent the IANA registry from containing superset
> >relationship information.
> >
> >Regards,
> >
> >--
> >Fran輟is Yergeau
>