[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Best fit




Frank Ellermann wrote:
> > ICU may have chosen 0x1A, but that was their own decision. There is
> > no interoperability problem here
> 
> An u2w.icu( x ) != u2w.bestfit( x ) effect could be ugly.  For some

As I said, the fallbacks do not belong in the registration. It should be
perfectly ok to use other fallbacks. E.g. generating higher level
markup,
be it character escapes or more [like <sup>...</sup> for instance, or
<span class="red">...</span>], or some "this-is-even-better-fit".

The fallbacks ("bestfit") of the "bestfit" file should *NOT* be part of
the IANA charset registration!

> code pages like <http://purl.net/net/cp/858> ICU tries hard to list
> an "official" substitution character, in that case 0x7F, not 0x1A.

As I mentioned, the ICU API allows the programmer quite a lot of control
on how to handle conversion errors. One can set it up to automatically
generate XML-ish or Java-ish escapes (which I prefer, even if not
targeting
XML or Java), or to use another "error" character (I would *never*
choose '?'
for that). One can set up ones own callback function for conversion
errors.

> > Should we strip the best fit mappings from the table and post it
> > somewhere?

There's one already.

> They're fine, but could be improved by adding a hint how they were 
> determined, and who could fix them if needed.

The "bestit" one should NOT be used for the registration. It could be
seen as making any "better" converters (e.g. generating XML escapes)
"non-conforming" (each requiring a different charset registration;
'Windows-1252-XMLescapes', 'Windows-1252-XMLescapes-boldnredCSS',
'Windows-1252-XMLescapes-boldnredCSS-butSUPforsuperscripts', 
'Windows-1252-johndoesbetterfit', ...). I hope you don't want that.

		/kent k