[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

unknown-xyz (was: Volunteer needed to serve as IANA charset reviewer)

To: ietf-charsets@mail.apps.ietf.org
Subject: unknown-xyz (was: Volunteer needed to serve as IANA charset reviewer)
From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Mon, 02 Oct 2006 02:59:31 +0200
List-Id: <ietf-charsets@mail.apps.ietf.org>
List-Owner: <mailto:ietf-charsets-owner@mail.apps.ietf.org>
List-Subscribe: <mailto:mailserv@mail.apps.ietf.org?subject=subscribe%20ietf-charsets>
List-Unsubscribe: <mailto:mailserv@mail.apps.ietf.org?subject=unsubscribe%20ietf-charsets>
Message-hash: F0684B113AA2A8207D1379E709B56AD6
Organization: <URL:http://purl.net/xyzzy>
Original-recipient: rfc822;ned+ietf-charsets@mrochek.com
References: <p06240600c124bdb12d16@[10.0.1.2]><200609070633.54738@mail.blilly.com> <20060907155614.GA21105@rap.rap.dk><200609071717.22669@mail.blilly.com> <eefkqv$jo5$1@sea.gmane.org><450D448E.1439@xyzzy.claranet.de> <efp0pl$vsm$1@sea.gmane.org>
Sender: news <news@sea.gmane.org>
Spam-test: False ; 0.0 / 4.5

Claus F„rber wrote:

>>> "UNKNOWN-UTF16"
>> What's the difference from UTF-16 ?

> UTF-16 "SHOULD be interpreted as being big-endian" if there's
> no BOM, RFC 2781, 4.3. UNKNOWN-UTF16 would not have such a
> fall back.

Okay, but with a good excuse violating a SHOULD is possible...

>>> with alias "UNICODE".
>> Ugh, thanks, but no thanks.

> The idea is to deprecate the label "UNICODE" by tying it to
> an incompletly specified charset.

...sneaky <g>

In reality that boils down to "any even number of octets not
including 0xfeff or 0xfffe", or do I miss something ?  Who
could be interested in that difference from "unknown-8bit" ?

---
>>> "UNKNOWN-ISO-8859" with alias "ANSI".
>>> "UNKNOWM-IBMPC" with alias "OEM".

>> One of those could do, "unknown-ascii-8bit", alias "oem".

> We already have UNKNOWN-8BIT.
> When you convert legacy data, you often DO know that 
> something is in a DOSish (IBMPC-based) or Windowsish
> (ANSI-based) charset. Having charset labels to carry
> this information (instead of the unspecified UNKNOWN-8BIT)
> is a good idea.

Yes, but why the difference, who's supposed to guess what's
what, and who's interested in the dubious outcome of such
guesses ?

If I screw-up what you get is a bogus "Latin-1", and you can
correctly guess that it must be bogus as soon as you find any
C1 octets.  But without human intervention you don't know how
I screwed up, it's windows-1252, pc-multilingual-850+euro, or
worse (cp437, wild mixtures, who knows).

An "unknown-ascii-8bit" => neither ISO-8859-x nor UTF-8, but
at least MIME compatible (one hopes).

The W3C validator could make use of that "unknown-ascii-8bit",
one error for that (if it's only a guess), but then continue
to report unrelated interesting errors.

Frank
-- 
Honk for 4234 to STD

References:
- Volunteer needed to serve as IANA charset reviewer
  - From: Ted Hardie <hardie@qualcomm.com>
- Re: Volunteer needed to serve as IANA charset reviewer
  - From: Bruce Lilly <blilly@erols.com>
- Re: Volunteer needed to serve as IANA charset reviewer
  - From: Keld Jørn Simonsen <keld@dkuug.dk>
- Re: Volunteer needed to serve as IANA charset reviewer
  - From: Bruce Lilly <blilly@erols.com>
- Re: Volunteer needed to serve as IANA charset reviewer
  - From: Claus Färber <claus@faerber.muc.de>
- Re: Volunteer needed to serve as IANA charset reviewer
  - From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Re: Volunteer needed to serve as IANA charset reviewer
  - From: Claus Färber <claus@faerber.muc.de>

Prev by Date: Re: Registration of new charset [ISO-2022-JP-2004]
Next by Date: Re: Registration of new charset BRF
Prev by thread: Re: Volunteer needed to serve as IANA charset reviewer
Next by thread: Re: Volunteer needed to serve as IANA charset reviewer
Index(es):
- Date
- Thread