[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IANA Character Set Registration Submittal



There are two items.
1. It is unclear from the statement of the RFC that it is limited to only
round-trip mappings.
2. MS does round-trip cases like 0x81 and U+0081, even though it is
undocumented. Registering a charset that doesn't actually match what is done
in practice will simply lead to problems.

‎Mark

----- Original Message ----- 
From: "Mike Ksar" <mikeksar@microsoft.com>
To: "Markus Scherer" <markus.icu@gmail.com>; <ietf-charsets@iana.org>
Cc: "Mike Ksar" <mikeksar@microsoft.com>
Sent: Friday, March 25, 2005 17:01
Subject: RE: IANA Character Set Registration Submittal


> Markus,
>
> What you indicate is missing is not required by IANA in RFC 2978.  The
references that I included refer to how MS defines these character sets and
they are implemented in our products based on those specifications.  In
particular:
>
> 1.  Fallback mappings is not part of the registration procedures in RFC
2978.
> 2.  Substitution characters is not part of the registration procedures
either.
> 3.  Same with MBCS charsets.
>
> Your concerns are implementation specific rather than registration
requirements.  Nine of the submittals are to update references on already
registered character sets at IANA.
>
> Mike Ksar
>
> ________________________________
>
> From: Markus Scherer [mailto:markus.icu@gmail.com]
> Sent: Fri 3/25/2005 2:57 PM
> To: ietf-charsets@iana.org
> Subject: Re: IANA Character Set Registration Submittal
>
>
>
> I would like to point out that the published specifications that are
> referenced here are incomplete. In particular:
> 1. The specifications only show roundrip mappings, which means they
>    are each omitting hundreds of fallback (one-way) mappings that
>    Windows actually performs.
> 2. The substitution characters are not specified.
> 3. For MBCS charsets, the specification is incomplete as to which byte
sequences
>    are valid vs. illegal vs. unassigned.
>    While lead bytes are specified, trail byte ranges are not,
>    and illegal vs. unassigned (e.g., windows-932 0x80) are not specified.
>
> See also
> http://www.unicode.org/reports/tr22/
> http://icu.sourceforge.net/charts/charset/
> The latter link points to data files showing actual Windows conversion
> API behavior.
> (Due to a recent web site move, some URLs may not work correctly; in
> this case, try
http://dev.icu-project.org/cgi-bin/viewcvs.cgi/charset/data/
> for the data files.)
>
> Best regards,
> markus
>
> On Tue, 15 Mar 2005 13:43:19 -0800, Mike Ksar <mikeksar@microsoft.com>
wrote:
> >
> > Attached are 5 new charset registration applicatons and 9 previously
> > registered charsets which needed updating.
>
> --
> Opinions expressed here may not reflect my company's positions unless
> otherwise noted.
>
>
>