[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Big5 / CP950



Hi Shawn,

RFC 2978 section 5 'Charset Registration Template'

     "Intended usage:

     (One of COMMON, LIMITED USE or OBSOLETE)"

The spirit of LIMITED USE has been to discourage the use
of legacy charsets that are particularly problematic - Big5.

Not sure if OBSOLETE has ever been used.

Martin - searching for this made me realize that the
plaintext IANA Charset Registry at

  ftp://ftp.iana.org/assignments/character-sets

contains 257 entries - they don't include the Intended
Usage field.

I suggest we work w/ IANA to change the plaintext
registry.

In most cases this data is long lost (if ever submitted)
because the directory

  ftp://ftp.iana.org/assignments/charset-reg

contains only 55 entries.

Cheers,
- Ira

Ira McDonald (Musician / Software Architect)
Chair - Linux Foundation Open Printing WG
Co-Chair - IEEE-ISTO PWG IPP WG
Chair - TCG Embedded Systems Hardcopy SWG
IETF Designated Expert - IPP & Printer MIB
Blue Roof Music/High North Inc
http://sites.google.com/site/blueroofmusic
http://sites.google.com/site/highnorthinc
mailto:blueroofmusic@gmail.com
Christmas through April:
  579 Park Place  Saline, MI  48176
  734-944-0094
May to Christmas:
  PO Box 221  Grand Marais, MI 49839
  906-494-2434



On Wed, Sep 21, 2011 at 11:13 AM, Shawn Steele <Shawn.Steele@microsoft.com> wrote:
Moved the note, Removed big5+, if anyone knows other examples, I'd include those.

> You have "COMMON" here while your Shift_JIS registration has "LIMITED".
> Is that by accident, or is there some rationale behind it?

Um, by accident.  I copied the original shift-jis registration, and used the windows-1252 as a template for this.  I have no clue what the distinction is :)  Changed to LIMITED USE.  (reasoning that the variations are cause instability between implementations, so I'd much rather have people picking something like UTF-8).  Is there a definition of these terms?  All of them should be OBSOLETE in favor of UTF-* ;-)  I'd use that if I could get away with it.

-Shawn

 
http://blogs.msdn.com/shawnste

________________________________________
From: "Martin J. Dürst" [duerst@it.aoyama.ac.jp]
Sent: Wednesday, September 21, 2011 1:00 AM
To: Shawn Steele
Cc: 'ietf-charsets@mail.apps.ietf.org'; Makoto Murata (eb2m-mrt@asahi-net.or.jp)
Subject: Re: Big5 / CP950

Hello Shawn,

On 2011/09/21 2:44, Shawn Steele wrote:
> Here's some proposed text for a more complete registration.

Many thanks for doing this work. Some comments below, mostly nits.

> Comments welcome.  AFAICT this code page is quite a bit less stable than others, and there are a plethora of mappings.  I've included two ISO10646 equivalency tables for that reason.
>
> Thanks,
> Shawn
>
>
-----------------------------------

Charset name: big5

Charset aliases: (None)

MIBenum: 2026

Suitability for use in MIME text:

Yes, big5 is suitable for use with subtypes of the "text"
Content-Type. Note that big5 is an 8-bit charset. Care should
be taken to choose an appropriate Content-Transfer-Encoding.

Two example ISO 10646 equivalency tables:  http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT

Note that Big5 has many variants, so these exemplars provide two
common mappings:

Additional information:

Several vendor specific charsets that derive from Big5 often use
the Big5 name instead of a more specific vendor charset name.
Big5-HKSCS is one example, Microsoft Code Page 950, and
several font specific variations are other examples.

Although not authoritative, the following references may also be of
interest:

Printed mapping table:
Dr. International "Developing International Software, Second Edition",
Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 778 and appendixes on CD.

Microsoft windows extended "best fit" behavior:
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt

Additional information about the many variants of Big5:
The wide variety of existing variations of Big5 may make it
unsuitable for many modern applications.  Developers should
consider whether UTF-8 or UTF-16 would be more appropriate for
new applications.

This is an update of an existing registration of this charset. This
charset name is in use.

This charset is also known as Windows Code Page 950 or cp950 for
short; these are NOT aliases.

Person&  email address to contact for further information:

Shawn Steele
Email: Shawn.Steele&microsoft.com

Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
U.S.A.

Intended usage: LIMITED USE