[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 revision

To: Ned Freed <[email protected]>
Subject: Re: UTF-8 revision
From: Francois Yergeau <[email protected]>
Date: Tue, 02 Sep 1997 12:41:16 -0400
Cc: [email protected]
In-reply-to: <[email protected]>
References: <"Your message dated Sun, 31 Aug 1997 15:03:02 -0400"<[email protected]>

� 12:42 31/08/97 -0700, Ned Freed a �crit :
>(1) The discussion of the Hangul mess and versioning is far too
>    wishy-washy. What needs to be said is that the charset label "UTF-8" is
>    aligned with the character assignments in Unicode 2.0 or later and that
>    it is NOT aligned with the assignments in Unicode 1.0 or 1.1, in
>    particular the old Hangul range.

Agreed, it needs to be much more explicit.  What about the following
changes in section 5 :

1st paragraph:

 This memo is meant to serve as the basis for registration of a MIME
 character set parameter (charset) [MIME].  The proposed charset
 parameter value is "UTF-8".  This string would label media types
 containing text consisting of characters from the repertoire of ISO/IEC
 10646 including all amendments at least up to amendment 5 (Korean
 block), encoded to a sequence of octets using the encoding scheme 
 outlined above.  UTF-8 is suitable for use in MIME content types
 under the "text" top-level type.

BTW, shouldn't the reference to [MIME] above be changed to refer to
draft-freed-charset-reg-02.txt ?

Last paragraph, now split in two:

 In practice, then, a version-independent label is warranted, provided
 the label is understood to refer to all versions after Amendment 5,
 and provided no incompatible changes actually occur.  Should
 incompatible changes occur in a later version of ISO 10646, the MIME
 charset label defined here will stay aligned with the previous version
 until and unless the IETF specifically decides otherwise.

 Should the
 need ever arise to distinguish data containing Hangul encoded according to
 Unicode 1.1, then a version-dependent label, for that version only, should
 be registered (a suggestion would be "UNICODE-1-1-UTF-8"), in order to
 retain the advantages of a version-independent label for 2.0 and later 
 versions.  Such a version-dependent label could even be registered before
 actual need arises, pre-emptively, but it is important to strongly
 recommend against creating any new Hangul-containing data without 
 taking Amendment 5 of ISO 10646 into account. 

Note that this last sentence is actually a suggestion that should perhaps
be decided at once.  Do we want to pre-emptively register
"UNICODE-1-1-UTF-8" or some such?  If so, let's have affirmative language;
if not, let's remove that last sentence.

>    I therefore think that
>    this specification needs to say that it aligns automatically with
>    all future versions of Unicode that don't make incompatible changes, but
>    the minute one is made it stays aligned with the old version until and
>    unless the IETF specifically decides otherwise.

I think the new language above addresses that.  How is that?

Regards,


-- 
Fran�ois Yergeau <[email protected]>
Alis Technologies inc., Montr�al
T�l : +1 (514) 747-2547
Fax : +1 (514) 747-2561

Follow-Ups:
- Re: UTF-8 revision
  - From: Ned Freed <[email protected]>

References:
- UTF-8 revision
  - From: Francois Yergeau <[email protected]>
- Re: UTF-8 revision
  - From: Ned Freed <[email protected]>

Prev by Date: RE: Charset policy - Post Munich
Next by Date: Re: UTF-8 revision
Prev by thread: Re: UTF-8 revision
Next by thread: Re: UTF-8 revision
Index(es):
- Date
- Thread