[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Comments on draft-yergeau-rfc2279bis-00.txt

To: Patrik F$BgM(B tstr $B‹N(B <paf@cisco.com>, Francois Yergeau <FYergeau@alis.com>
Subject: Re: Comments on draft-yergeau-rfc2279bis-00.txt
From: Martin Duerst <duerst@w3.org>
Date: Fri, 18 Oct 2002 15:23:41 +0900
Cc: ietf-charsets@iana.org
In-reply-to: <9394AB2C-E1D3-11D6-8EA6-0003934B2128@cisco.com>
Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
References: <F7D4BDA0E5A1D14B99D32C022AEB736680CA43@alis-2k.alis.domain>
Spam-test: False ; 1.9 / 5.2

At 15:23 02/10/17 +0200, Patrik F$BgM(Btstr$B‹N(B wrote:
>On torsdag, okt 3, 2002, at 20:11 Europe/Stockholm, Francois Yergeau wrote:
>
>>- I think it would be better for *this* RFC to refrain from telling senders
>>and receivers what to do with the BOM, but to offer advice to protocol
>>designers.  It is specific protocols that should know better where the BOM
>>should be banned or allowed.
>
>Not correct, _this_ RFC have to be stringent enough so it is crystal clear 
>whether BOM should be there, what is to happen if it exists, and what is 
>to happen if it doesn't.

I thought about this, too, and while initially I agreed with Francois,
I came to the same conclusion as Patrick over the last few days.
Here is why:

- Having every IETF WG discuss and decide where they want to use a BOM
   and where not, and so on, is a waste of time by the wrong people. In
   most cases, the WGs won't waste that time, and then we have the problem
   that it's not clear what is supposed to happen.

- Protocols have to interoperate. If different protocols take different
   solutions, we'll have a mess.

- For things that are labeled, there is one charset 'utf-8'. If it means
   different things in different contexts, that's bad. Protocols shouldn't
   change the meaning of 'utf-8'. It would be a bad idea if charset=utf-8
   meant something for http and something else for ftp.

- For things that are not labeled, where the spec just says 'this is in
   utf-8', there is obviously no need to use a BOM.

>This in turn have to be verified in an Interoperability test, for example 
>using protocols which allow tagged and untagged UTF-8 and digital 
>signatures, which ensures we have multiple implementations of the standard.
>
>The documentation of this interoperability (which doesn't have to be a 
>formal test, but documented) is part of the last call which I am to issue 
>as soon as we have a document and the documentation.
>
> From IETF point of view, we do _not_ like alternate byte orders. We had 
> this discussion in IESG when UTF-16* charsets were to be registered. Many 
> voices in the IESG only wanted to register "the correct one". The author 
> (Paul) and myself argued for always having tags for every weird charset, 
> but say strongly only one format SHOULD be used.
>
>What I hear on this list is that the consensus is that BOM SHOULD NOT be 
>used. I would like it to be MUST NOT be used in Internet protocols,

My opinion is that it should be MUST NOT for (small) protocol elements,
and SHOULD NOT for larger things.

>which leads to tagged UTF-8 text be illegal if the BOM exists in the text.

This statement is a bit confusing. Do you mean 'utf-8 text be illegal if
the BOM exists at the start of the text'?

>Anyway, what needs to happen now is two things:
>
>  - The text in the document has to be change to say BOM is not to be used

I think we are already close, but we have to clean it up a bit.

>  - Someone has to write down interoperability information between
>    applications
>
>Regarding the interoperability, as Francois is working hard with the 
>document, can I get someone else to write this? I myself are mostly 
>irritated I can not copy and paste Unicode text between TextEdit and 
>Microsoft software in MacOSX, so I might not be the best person to write 
>down things that work.... :-(

Given that my mailer doesn't do utf-8, I'm also not the best candidate.
But I would like to know how this is supposed to be done. Sending UTF-8
over HTTP, for example, can be seen on various levels. Just shipping the
bytes obviously will work. Getting things displayed the right way also
works in many browsers (each browser has some limitations as to what
exactly it's able to display, also dependent on the fonts). We can easily
list Netscape Navigator (since 4), Microsoft Internet Explorer (since 4
at least), Opera (since 6), Mozilla, Amaya (recently). Is that enough
on the recipient side?

On the sending side, I'm not so clear. To serve a file as UTF-8, the
main thing to do is to put the file there, and to tell the server
that it's utf-8. Is that enough, or is there a need for more (e.g.
that the file is produced by a script,...)?

Then there is also the encryption thing. I have no experience with
setting up an https server. But I may be able to contact the
XML Dsig folks; they do signatures after conversion to utf-8,
and should have something testable available.

Regards,    Martin.

References:
- RE: Comments on draft-yergeau-rfc2279bis-00.txt
  - From: Francois Yergeau <FYergeau@alis.com>
- Re: Comments on draft-yergeau-rfc2279bis-00.txt
  - From: Patrik Fältström <paf@cisco.com>

Prev by Date: Re: Comments on draft-yergeau-rfc2279bis-00.txt
Next by Date: test message - ignore
Prev by thread: Re: Comments on draft-yergeau-rfc2279bis-00.txt
Next by thread: Re: Comments on draft-yergeau-rfc2279bis-00.txt
Index(es):
- Date
- Thread