[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Comments on draft-yergeau-rfc2279bis-00.txt
Hi Francois,
Please look at RFC 2640 "Internationalization of FTP" (July 1999,
Proposed Std status currently), which says:
2.1 International Character Set
The character set defined for international support of FTP SHALL be
the Universal Character Set as defined in ISO 10646:1993 as amended.
This standard incorporates the character sets of many existing
international, national, and corporate standards. ISO/IEC 10646
defines two alternate forms of encoding, UCS-4 and UCS-2. UCS-4 is a
four byte (31 bit) encoding containing 2**31 code positions divided
into 128 groups of 256 planes. Each plane consists of 256 rows of 256
cells. UCS-2 is a 2 byte (16 bit) character set consisting of plane
zero or the Basic Multilingual Plane (BMP). Currently, no codesets
have been defined outside of the 2 byte BMP.
The Unicode standard version 2.0 [UNICODE] is consistent with the
UCS-2 subset of ISO/IEC 10646. The Unicode standard version 2.0
includes the repertoire of IS 10646 characters, amendments 1-7 of IS
10646, and editorial and technical corrigenda.
2.2 Transfer Encoding
UCS Transformation Format 8 (UTF-8), in the past referred to as UTF-2
or UTF-FSS, SHALL be used as a transfer encoding to transmit the
international character set. UTF-8 is a file safe encoding which
avoids the use of byte values that have special significance during
the parsing of pathname character strings. UTF-8 is an 8 bit encoding
of the characters in the UCS. Some of UTF-8's benefits are that it is
compatible with 7 bit ASCII, so it doesn't affect programs that give
special meanings to various ASCII characters; it is immune to
synchronization errors; its encoding rules allow for easy
identification; and it has enough space to support a large number of
character sets.
<...snip...more description of the details and virtues of UTF-8...>
3.2 Servers compliance
- Servers MUST support the UTF-8 feature in response to the FEAT
command [RFC2389]. The UTF-8 feature is a line containing the exact
string "UTF8". This string is not case sensitive, but SHOULD be
transmitted in upper case. The response to a FEAT command SHOULD
be:
C> feat
S> 211- <any descriptive text>
S> ...
S> UTF8
S> ...
S> 211 end
The ellipses indicate placeholders where other features may be
included, but are NOT REQUIRED. The one space indentation of the
feature lines is mandatory [RFC2389]."
Such an FTP server explicitly negotiates with the FTP client that they
BOTH support UTF-8 for the transfer encoding. It thus becomes the
responsibility of the CLIENT to previously convert legacy encodings
to UTF-8. The target system will receive and (hopefully) store the
transferred file in UTF-8.
Cheers,
- Ira McDonald
High North Inc
-----Original Message-----
From: Francois Yergeau [mailto:FYergeau@alis.com]
Sent: Friday, October 04, 2002 3:53 PM
To: ietf-charsets@iana.org
Subject: RE: Comments on draft-yergeau-rfc2279bis-00.txt
Martin Duerst wrote:
> As far as I understand most contributions on the list in the past
> day or so, the standard should discourage the BOM, but it currently
> doesn't.
That much is clear. It seems there will have to be a draft-03 with some
additional language in that direction.
> > > UTF-8 never needs a 'byte-order' signature.
> >
> >This is unfortunately not true, except in the limited realm
> >of properly internationalized protocols
>
> As for example IETF protocols.
Errr, some IETF protocols. I have no way to tell an FTP server what is the
charset of a file I'm uploading, nor does the server have any way of telling
me the charset of a file I'm downloading. And even if it had a way (like in
HTTP), the server most probably wouldn't know and would either not tell or
lie.
--
François