[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Registration of new charset "UTF-16"



On Sat, 16 May 1998, Larry Masinter wrote:
> > We might eventually define a MIME "widetext" top-level media type for
> > plaintext data using UTF-16 or UCS-4, but I don't think it's time to do
> > that yet.  UTF-8 is standards track and may be freely used in text/* media
> > types.
> 
> I think it's time to do this. It's been a recurring issue for as long
> as I've been trying to work the 'charset' issue in HTTP. What's needed
> is something that just doesn't have the same end-of-line handling that 'text'
> has, so I believe 'etext' would do.

Shall we go ISO-10646 only on this media type?  How about only allowing
UTF-16 and UCS-4?

What about end of line canonicalization?  Do we stick with CRLF, or should
we use the ISO 10646 Line Separator and Paragraph Separator characters?
Or do we give up on a canoncial form and just state that widetext/etext
probably isn't suitable for use with digital signatures.

Do we want to keep the ability to display unknown widetext/etext media
subtypes to the user, or do we want to declare that a failure and treat
all unknown types as application/octet-stream?

If UCS-4 is going to be useful, shouldn't we introduce a compressing
content-transfer-encoding at the same time?

What endian rules will there be?  Always big-endian as is traditional and
will work well with multipart/signed?  Or follow the lead of the TIFF and
the XML spec and allow either endian (and the resulting interoperability
problems many of us have seen)?

In email, how do you find out if a recipient supports widetext/etext,
given it will be unreadable by most recipients?

If people are convinced it's time to attack the widetext/etext problem,
then there are a lot of hard decisions to make.

		- Chris