[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Big5 / CP950



Here's some proposed text for a more complete registration.  Comments welcome.  AFAICT this code page is quite a bit less stable than others, and there are a plethora of mappings.  I've included two ISO10646 equivalency tables for that reason.

Thanks,
Shawn


-----------------------------------

Charset name: big5

Charset aliases: (None)

MIBenum: 2026

Suitability for use in MIME text:

Yes, big5 is suitable for use with subtypes of the "text" 
Content-Type. Note that big5 is an 8-bit charset. Care should 
be taken to choose an appropriate Content-Transfer-Encoding.

Two example ISO 10646 equivalency tables:  Note that Big5 has
many variants, so these exemplars provide two common mappings:
http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT 
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT

Additional information:

Several vendor specific charsets that derive from Big5 often use
the Big5 name instead of a more specific vendor charset name.
Big5-HKSCS is one example, Microsoft Code Page 950, Big5+ and
several font specific variations are other examples.

Although not authoritative, the following references may also be of 
interest:

Printed mapping table:
Dr. International "Developing International Software, Second Edition", 
Microsoft Press, ISBN 0-7356-1583-7, 2003, p. 778 and appendixes on CD.

Microsoft windows extended "best fit" behavior:
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt 

Again not authoritative, but the Wikipedia article currently touches
on the many variations of Big5 and may be of interest to implementers:
http://en.wikipedia.org/wiki/Big-5 

The wide variety of existing variations of Big5 may make it
unsuitable for many modern applications.  Developers should
consider whether UTF-8 or UTF-16 would be more appropriate for
new applications.

This is an update of an existing registration of this charset. This 
charset name is in use.

This charset is also known as Windows Code Page 950 or cp950 for 
short; these are NOT aliases.

Person & email address to contact for further information:

Shawn Steele
Email: Shawn.Steele&microsoft.com

Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
U.S.A.

Intended usage: COMMON