[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Big5 / CP950



On 2011/09/22 3:05, Shawn Steele wrote:
> I saw the “one of…”, but they aren’t defined in the RFC?  Your spirit of Limited Use sounds about right for big5 though.

I agree.

For some more comments, please see below.

> Thanks,
> Shawn
>
> From: Ira McDonald [mailto:blueroofmusic@gmail.com]
> Sent: Wednesday, September 21, 2011 9:41 AM
> To: Shawn Steele; Ira McDonald
> Cc: "Martin J. Dürst"; ietf-charsets@mail.apps.ietf.org; Makoto Murata (eb2m-mrt@asahi-net.or.jp)
> Subject: Re: Big5 / CP950
>
> Hi Shawn,
>
> RFC 2978 section 5 'Charset Registration Template'
>
>       "Intended usage:
>
>       (One of COMMON, LIMITED USE or OBSOLETE)"
>
> The spirit of LIMITED USE has been to discourage the use
> of legacy charsets that are particularly problematic - Big5.
>
> Not sure if OBSOLETE has ever been used.

I haven't checked, but I guess these were not introduced when the 
charset registry was created, but with a later update.

I assume the distinction between COMMON and LIMITED USE was originally 
intended as some kind of advice to implementers: If it's COMMON, then 
make sure it's supported, if it's LIMITED USE, you may not need it. But 
I don't think that has ever really worked.


> Martin - searching for this made me realize that the
> plaintext IANA Charset Registry at
>
>    ftp://ftp.iana.org/assignments/character-sets
>
> contains 257 entries - they don't include the Intended
> Usage field.
>
> I suggest we work w/ IANA to change the plaintext
> registry.

Assuming somebody has lots of spare time, that would indeed be a good 
idea. Assuming that everybody's time is rather limited, it may have to 
wait. There are quite a few other things in the registry that might 
benefit from clearing up, but the critical mass may not be reached yet.

> In most cases this data is long lost (if ever submitted)
> because the directory
>
>    ftp://ftp.iana.org/assignments/charset-reg
>
> contains only 55 entries.

Lots of stuff was taken from http://tools.ietf.org/html/rfc1345 (and 
some other places). There's no need to keep that kind of information in 
separate templates.

Regards,    Martin.

> Cheers,
> - Ira
>
> Ira McDonald (Musician / Software Architect)
> Chair - Linux Foundation Open Printing WG
> Co-Chair - IEEE-ISTO PWG IPP WG
> Chair - TCG Embedded Systems Hardcopy SWG
> IETF Designated Expert - IPP&  Printer MIB
> Blue Roof Music/High North Inc
> http://sites.google.com/site/blueroofmusic
> http://sites.google.com/site/highnorthinc
> mailto:blueroofmusic@gmail.com<mailto:blueroofmusic@gmail.com>
> Christmas through April:
>    579 Park Place  Saline, MI  48176
>    734-944-0094
> May to Christmas:
>    PO Box 221  Grand Marais, MI 49839
>    906-494-2434
>
>
> On Wed, Sep 21, 2011 at 11:13 AM, Shawn Steele<Shawn.Steele@microsoft.com<mailto:Shawn.Steele@microsoft.com>>  wrote:
> Moved the note, Removed big5+, if anyone knows other examples, I'd include those.
>
>> You have "COMMON" here while your Shift_JIS registration has "LIMITED".
>> Is that by accident, or is there some rationale behind it?
> Um, by accident.  I copied the original shift-jis registration, and used the windows-1252 as a template for this.  I have no clue what the distinction is :)  Changed to LIMITED USE.  (reasoning that the variations are cause instability between implementations, so I'd much rather have people picking something like UTF-8).  Is there a definition of these terms?  All of them should be OBSOLETE in favor of UTF-* ;-)  I'd use that if I could get away with it.
>
> -Shawn
>
>  
> http://blogs.msdn.com/shawnste
> ________________________________________
> From: "Martin J. Dürst" [duerst@it.aoyama.ac.jp<mailto:duerst@it.aoyama.ac.jp>]
> Sent: Wednesday, September 21, 2011 1:00 AM
> To: Shawn Steele
> Cc: 'ietf-charsets@mail.apps.ietf.org<mailto:ietf-charsets@mail.apps.ietf.org>'; Makoto Murata (eb2m-mrt@asahi-net.or.jp<mailto:eb2m-mrt@asahi-net.or.jp>)
> Subject: Re: Big5 / CP950
>
> Hello Shawn,
>
> On 2011/09/21 2:44, Shawn Steele wrote:
>> Here's some proposed text for a more complete registration.
>
> Many thanks for doing this work. Some comments below, mostly nits.
>
>> Comments welcome.  AFAICT this code page is quite a bit less stable than others, and there are a plethora of mappings.  I've included two ISO10646 equivalency tables for that reason.
>>
>> Thanks,
>> Shawn
>>
>>
> -----------------------------------
>
> Charset name: big5
>
> Charset aliases: (None)
>
> MIBenum: 2026
>
> Suitability for use in MIME text:
>
> Yes, big5 is suitable for use with subtypes of the "text"
> Content-Type. Note that big5 is an 8-bit charset. Care should
> be taken to choose an appropriate Content-Transfer-Encoding.
> Two example ISO 10646 equivalency tables:  http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
> http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
>
> Note that Big5 has many variants, so these exemplars provide two
> common mappings:
> Additional information:
>
> Several vendor specific charsets that derive from Big5 often use
> the Big5 name instead of a more specific vendor charset name.
> Big5-HKSCS is one example, Microsoft Code Page 950, and
> several font specific variations are other examples.
> Although not authoritative, the following references may also be of
> interest:
>
> Printed mapping table:
> Dr. International "Developing International Software, Second Edition",
> Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 778 and appendixes on CD.
>
> Microsoft windows extended "best fit" behavior:
> http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt
> Additional information about the many variants of Big5:
> http://en.wikipedia.org/wiki/Big-5
> The wide variety of existing variations of Big5 may make it
> unsuitable for many modern applications.  Developers should
> consider whether UTF-8 or UTF-16 would be more appropriate for
> new applications.
>
> This is an update of an existing registration of this charset. This
> charset name is in use.
>
> This charset is also known as Windows Code Page 950 or cp950 for
> short; these are NOT aliases.
>
> Person&   email address to contact for further information:
>
> Shawn Steele
> Email: Shawn.Steele&microsoft.com<http://microsoft.com>
>
> Microsoft Corporation
> One Microsoft Way
> Redmond, WA 98052
> U.S.A.
> Intended usage: LIMITED USE
>