[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fwd: I-D ACTION:draft-goldsmith-utf7-01.txt
À 11:29 05-02-97 -0800, David Goldsmith a écrit :
>FYI
>
>If there are no objections I will try to advance this to RFC
>(Experimental) status in two weeks, then register "UTF-7".
Good. Some comments:
>Abstract
>
>...
> This document describes a transformation format of Unicode that
> contains only 7-bit ASCII characters and ...
This is misleading. UTF-7 encodes UCS *characters* using only 7-bit
ASCII-valued *octets*.
>Overview
>
> UTF-7 encodes Unicode characters as US-ASCII, together with...
Same remark.
> UTF-7 should normally be used only in the context of 7 bit
> transports, such as mail and news. In other contexts, straight
> Unicode or UTF-8 is preferred.
Great! Please remove "and news", however. News are in effect 8-bit clean;
many newsgroups use 8-bits charsets routinely, and all widespread
implementations are 8-bit clean. Even the IAB charset workshop report
(draft-weider-iab...) recognizes that.
>UTF-7 Definition
>
> A UTF-7 stream represents 16-bit Unicode characters in 7-bit US-ASCII
> as follows:
Sugg.: "represents ... using 7-bit ASCII-valued octets as follows"
> Unicode is encoded using Modified Base64 by first converting
> Unicode 16-bit quantities to an octet stream (with the most
> significant octet first). Surrogate pairs (UTF-16) are converted
> by treating each half of the pair as a separate 16 bit quantity
> (i.e., no special treatment). Text with an odd number of octets is
> ill-formed.
Since the draft refers to 10646 as well as Unicode, it might be worth
saying that UCS-4 characters outside of the range accessible through UTF-16
cannot be transformed by UTF-7.
> 2. Most non-European alphabet-based languages (e.g., Greek)...
The Greek will sure be surprised to learn that they are not Europeans :-)
Regards,
--
François Yergeau <yergeau@alis.com>
Alis Technologies Inc., Montréal
Tél : +1 (514) 747-2547
Fax : +1 (514) 747-2561