[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: prefer-language tag



Mark Crispin wrote:

> In RFC 1766, we are labelling data.  That is, we are saying "this data is in
> the following language(s)".
> 
> What we want with a "preferred language" is to say "a human's language
> preference is the following language(s)".
> 
> These are two very different concepts.

Hmm, we disagree somewhat about whether RFC 1766 applies to this issue of
specifying language preferences, but I think we can agree that it does =not=
sufficiently cover the issue - we certainly need to specify additional or new
semantics for this problem.

> Chinese dialects are not mutually intelligible; Mandarin and Cantonese are
> more different than, say, Swedish and Norwegian or Spanish and Italian.

As you suggest, this is true for the =spoken= forms of these languages, but
not the written languages - Mandarin and Cantonese do indeed share the same
orthographic representation (modulo character set and font preference
issues).  This is one reason why I don't believe that a simple subtag-supertag
fallback will work - it depends on the medium of the data (text vs. audio, in
this case).

> My concern has to do with interoperability.  How should a French Canadian user
> configure his client so that it works with an arbitrary server that speaks
> French?

If you mean "a FR-CA user" and "a server that speaks FR", then:

  Prefer-Language:  FR-CA, FR

should work, shouldn't it?

> How should a server implementor offer French?  What does he do if he only
> offers one form of French?  What does he do if he offers both French Canadian
> and French French?  [I don't know if this is a major issue in French, but it
> definitely is for Spanish.]

Exactly - this differs from language to language.  That's why I don't think a
generic solution based on language tag prefixes will work.  The problem is
that a "simple" language tag like "FR" does indeed indicate a dialect of some
sort, despite the lack of a subtag.  For some languages, this "main" dialect
is at least comprehensible to native speakers of all other dialects.  I don't
think we should assume this is true for all languages and dialects, however.

My suggestion for reserving the "default" subtag, and explicitly labeling
alternatives with, say, "FR-default", is analogous to the ultimate fallback
being discussed in another thread, "i-default".  Namely, messages in
"FR-default" should be designed to be at least decodable by all speakers of
"FR-dd", for any dialect dd of French.  This allows server implementors to
write the "FR" alternative under the assumption that it will be read by native
speakers of whatever "main" dialect of the language the "FR" tag corresponds
to.  Then, if nothing else is available, clients preferring "FR-foo" get the
(decodable) "FR-default" version.

Your approach seems to be to let "FR-foo" fall back to "FR".  My concern about
this is that information providers will not make the "FR" alternative
necessarily comprehensible in all dialects - they will assume that "FR" means
"French as she is spoken by inhabitants of France" (i.e., "FR-FR"), not
"French comprehensible to any dialect speaker".  "FR-default" would allow an
explicit distinction between the "main" dialect and a simplified form of the
language.

- John Burger
  MITRE