[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Encoding Standard (mostly complete)
Hi Anne!
First, another proposal: In the name/label table, state that "unicode"
is another label for "utf-16" and that "unicodeFFFE" is another label
for "utf-16be".
Paradoxically, at least for HTML, then this is perhaps more relevant as
a label for UTF-8 encodings than as a label for for UTF-16 encodings -
however, this is also the case of for the other names of the UTF-16
encodings.
Anne van Kesteren, Thu, 26 Apr 2012 13:44:05 +0200:
> On Mon, 23 Apr 2012 13:30:04 +0200, Leif Halvard Silli:
>> (1) Just an idea - take it or leave it: Section 7 is called "The
>> encoding" and that it only describes a single encoding - UTF-8. In
>> order to emphasize "UTF-8" as _the_ encoding, how about collapsing
>> section 8 to 13 into a single section named "Legacy encodings", with 6
>> sub-sections?
>
> How would you title the subsections? I could not think of something good.
Roughly this was what I had in mind, starting with section 7:
7 The <ins>standard</ins> encoding
7.1 utf-8
8 The legacy encodings
8.1 The Single-byte legacy encodings
8.2 The Multi-byte legacy encodings
8.2.1 The Chinese (simplified) legacy encodings
8.2.1.1 gbk
8.2.1.2 gb18030
8.2.1.3 hz-gb-2312
8.2.2 The Chinese (traditional) legacy encodings
8.2.2.1 big5
8.2.3 The Japanese legacy encodings
8.2.3.1 euc-jp
8.2.3.2 iso-2022-jp
8.2.3.3 shift_jis
8.2.4 The Korean legacy encodings
8.2.4.1 euc-kr
8.2.4.2 iso-2022-kr
8.2.5 The utf-16 legacy encodings
8.2.5.1 utf-16
8.2.5.2 utf-16be
>> (2) I would suggest that you, the first time you talk about "byte order
>> mark", also introduce the abbreviation - "BOM". Currently, BOM occurs
>> in section 13 while "byte order mark" occurs in section 6.
>
> Done.
Would it not be an idea to say "(BOM)" (in an parenthesis) also in the
note under '13 Legacy utf-16 encodings', just in case someone searches
for "BOM"?
>> (3) Regarding the note "the byte order mark is considered more
>> authoritative than anything else", then I would suggest specifying what
>> "anything else" means. I suppose that it includes - or at least ought
>> to include
>>
>> HTTP,
>> <meta charset>,
>> <meta http-equiv=Content-Type>,
>> <?xml version="1.0" encoding="<anyvalue>" ?>
>> Manual encoding overriding by the user
>> The above is valid for both XML and HTML.
>>
>> Unless this is listed/described, then I think one starts to guess
>> what "anything else" means.
>
> I think this should become clearer once this is integrated into other
> specifications. I rather not mention too much format specific things
> here.
OK. But may be it would be possible to say something quite general
which leads the thought in the right direction? For instance:
"... than anything else, such as encoding declarations in higher
protocols, format specific encoding declarations and manual user
overriding."
--
Leif H Silli