[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
C3 System for character set conversion available for alpha testing
- To: i18n@dkuug.dk, ietf-charsets@INNOSOFT.COM, insoft-l@trans2.b30.ingr.com,iso8859@jhuvm.hcf.jhu.edu, iso10646@jhuvm.hcf.jhu.edu,rustex-l@ubvm.cc.buffalo.edu, Datorpostteknik i Norden <nordpost@nada.kth.se>,Projektet SUNET-MIME <sunet-mime@sunet.se>, tc304p09@dkuug.dk,tc304wg4@dkuug.dk, teckenbok@sics.se, wg-char@rare.nl, wg-msg@rare.nl,tref@vhs.se
- Subject: C3 System for character set conversion available for alpha testing
- From: Olle Jarnefors <ojarnef@admin.kth.se>, Peter Svanberg <psv@nada.kth.se>
- Date: Thu, 15 Dec 1994 23:59:48 +0100
- Cc: Olle Jarnefors <ojarnef@nada.kth.se>
- Reply-to: c3-questions@nada.kth.se
- Sender: Olle Jarnefors <ojarnef@nada.kth.se>
Trans-European Research and Education ANNOUNCEMENT Prototype Ap45
Networking Association (TERENA)
1994-12-15
Coded Character Set Conversion
Task-Force (C3-TF)
ALPHA TEST RELEASE OF THE C3 SYSTEM FOR CODED CHARACTER SET CONVERSION
TERENA (formerly RARE) has supported the development of better
tools for conversion between the continuously growing number of
coded character sets in use in academic computer networks in
Europe. The intention is to produce a general and flexible
system for Coded CHaracter set Conversion, called
>>> The C3 System <<<
This is the announcement of the alpha test release of software
(for Unix) and tables for the C3 System for *limited*
distribution amongst interested implementors, system
administrators and users.
+-------------------------------------------------------+
! Notice that this is a pre-release of software under !
! development, which has not yet been thoroughly tested !
! and is not intended for production use. !
+-------------------------------------------------------+
The package consists of:
> ANSI C code for a software library implementing parts of
the C3 API (see below).
> ANSI C code for a program "ccconv", which can be used either
as a character stream conversion filter or as a file conversion
program.
> Binary files for this software, compiled for SunOS 4.3.x
> Approximation table (see below)
> Definition tables for the following coded character sets:
ASCII ANSI X3.4
Swedish general 7-bit character set SS 63 61 27
Swedish 7-bit character set for names SS 63 61 27
Norwegian 7-bit character set NS 4551
UK 7-bit character set BS 4730
Croatian/Slovene 7-bit character set JUS I.Bl. 002
Latin-1 8-bit character set ISO 8859-1
Latin-2 8-bit character set ISO 8859-2
Latin-Cyrillic 8-bit character set ISO 8859-5
Original IBM PC character set IBM CP437
International IBM PC character set IBM CP850
Macintosh Extended Roman character set
UCS in 2-octet form at level 1 ISO 10646
> Documentation files:
Introduction to the C3 System (8 pages)
Directions for the installation of the C3 System (2 pages)
How to use the "ccconv" file conversion utility (4 pages)
How to use the C3 library of C functions (18 pages)
Explanation of identifiers and names used in C3 (2 pages)
Specification of the C3 API for conversion functions (37 pages)
The software is developed with the GNU gcc compiler, but any C
compiler allowing "const" and ANSI C function prototypes should work.
The latest C3 distribution and other C3 information is
avaliable in World Wide Web through
<URL:http://www.nada.kth.se/i18n/c3/>
or by anonymous FTP to ftp.nada.kth.se, directory
"pub/i18n/c3", i.e.
<URL:ftp://ftp.nada.kth.se/pub/i18n/c3/>
Email addresses:
<c3-questions@nada.kth.se> Questions, comments, bug reports, etc.
<c3-info-request@nada.kth.se> Subscription to info-about-C3 list
<c3-request@nada.kth.se> Subscription to discussion-about-C3 list
<c3@nada.kth.se> Contribution to discussion-about-C3 list
Features list:
+ Full _generality_: conversion can be done in any direction
between any pair of the coded character sets included in the
system.
+ _Approximate conversion_ when exact conversion is impossible:
There are no arbitrary identification of different characters
in the source and the target character sets. If the target
character set lacks a source character, the best possible
replacement character or string is used.
+ Can handle not only simple 7-bit and 8-bit coded character
sets, but also _advanced character sets_ such as the 16-bit
ISO 10646 character set (on implementation level 1) and
stateful character sets like ISO 6937/T.61. Incomplete
character sets, character sets lacking control characters,
indeterministic character sets, and ambiguous character sets
are also supported.
+ _Easy to use_ for the unsofisticated user (by means of
carefully chosen defaults).
+ _Flexible_ and fully configurable for the sophisticated
user/system administrator/application developer.
+ _Conversion parameters_ control the exact conversions
performed: different needs or restrictions in different
situations is easily handled by means of
- the three conversion types (one-to-one, legible,
reversible)
- separate specification of the conversion of line breaks
- the factor system (for varying cultural expectations
affecting preferrable approximate conversions).
+ _Easy to customize_: The conversion tables use a format
optimized for human readability which only uses the subset of
ISO 10646 hexadecimal values are used to refer to characters.
82 graphic characters available in all coded character sets.
Different full sets of conversion tables can be used in
parallel.
+ _Simple to extend_: To add a new coded character set, only
provide a definition table for it and approximate conversions
for any character in it that isn't included in any already
defined coded character set.
+ _Scalable_: To fully define the N(N-1) possible conversion
paths between N different coded character sets, only N+1
conversion tables are needed. How conversion is to be done
is defined by means of ISO 10646 as a common interface, but
the actual conversion is a direct transformation from
source character set to target character set, not involving
a 10646 representation as an intermediate step. Temporary
files are not needed.
What's unique in the C3 System?
The approximation table is the most innovative element in the
C3 approach to character set conversion. It specifies for each
character in any of the character sets for which definition
tables are given, how it is to be represented approximately
(by fall-back) in the target character set, if the character is
_not_ included in that character set. Several alternative
representations are specified for some characters, to take
advantage of the different character repertoires of different
target character sets.
The conversion tables use only the invariant part of ASCII. To
indicate other characters, the hexadecimal form of the coded
representations in UCS is used. No information specific to a
certain coded character set is included in the approximation
table.
The approximation table defines three types of conversion which
the user can choose from: Type 1 converts one source character
to one target character (best for tables and fields with length
restrictions). Type 2 converts characters to a more
understandable approximate representation, which may consists
of one or a few target characters (best for prose). Type 3 is
a reversible one-character-to-many-characters conversion, which
is based on the mnemonics defined by RFC 1345.
The C3 Task Force within TERENA consists of:
Borka Jerman-Blazic <jerman-blazic@ijs.si>
Olle Jarnefors <ojarnef@admin.kth.se>
Peter Svanberg <psv@nada.kth.se>
Keld Simonsen <keld@dkuug.dk>