[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding Standard (mostly complete)



Hi Anne!

First, another proposal: In the name/label table, state that "unicode" 
is another label for "utf-16" and that "unicodeFFFE" is another label 
for "utf-16be". 

Paradoxically, at least for HTML, then this is perhaps more relevant as 
a label for UTF-8 encodings than as a label for for UTF-16 encodings - 
however, this is also the case of for the other names of the UTF-16 
encodings.

Anne van Kesteren, Thu, 26 Apr 2012 13:44:05 +0200:
> On Mon, 23 Apr 2012 13:30:04 +0200, Leif Halvard Silli:

>> (1) Just an idea - take it or leave it: Section 7 is called "The
>> encoding" and that it only describes a single encoding - UTF-8. In
>> order to emphasize "UTF-8" as _the_ encoding, how about collapsing
>> section 8 to 13 into a single section named "Legacy encodings", with 6
>> sub-sections?
> 
> How would you title the subsections? I could not think of something good.

Roughly this was what I had in mind, starting with section 7:

  7 The <ins>standard</ins> encoding
  7.1 utf-8
  8 The legacy encodings
  8.1 The Single-byte legacy encodings
  8.2 The Multi-byte legacy encodings
  8.2.1 The Chinese (simplified) legacy encodings
  8.2.1.1 gbk
  8.2.1.2 gb18030
  8.2.1.3 hz-gb-2312
  8.2.2 The Chinese (traditional) legacy encodings
  8.2.2.1 big5
  8.2.3 The Japanese legacy encodings
  8.2.3.1 euc-jp
  8.2.3.2 iso-2022-jp
  8.2.3.3 shift_jis
  8.2.4 The Korean legacy encodings
  8.2.4.1 euc-kr
  8.2.4.2 iso-2022-kr
  8.2.5 The utf-16 legacy encodings
  8.2.5.1 utf-16
  8.2.5.2 utf-16be   
  
>> (2) I would suggest that you, the first time you talk about "byte order
>> mark", also introduce the abbreviation - "BOM". Currently, BOM occurs
>> in section 13 while "byte order mark" occurs in section 6.
> 
> Done.

Would it not be an idea to say "(BOM)" (in an parenthesis) also in the 
note under '13 Legacy utf-16 encodings', just in case someone searches 
for "BOM"?

>> (3) Regarding the note "the byte order mark is considered more
>> authoritative than anything else", then I would suggest specifying what
>> "anything else" means. I suppose that it includes - or at least ought
>> to include
>> 
>> 	HTTP,
>> 	<meta charset>,
>> 	<meta http-equiv=Content-Type>,
>> 	<?xml version="1.0" encoding="<anyvalue>" ?>
>> 	Manual encoding overriding by the user
>>     The above is valid for both XML and HTML.
>> 
>>    Unless this is listed/described, then I think one starts to guess
>> what "anything else" means.
> 
> I think this should become clearer once this is integrated into other 
> specifications. I rather not mention too much format specific things 
> here.

OK. But may be it would be possible to say something quite general 
which leads the thought in the right direction? For instance:

 "... than anything else, such as encoding declarations in higher 
protocols, format specific encoding declarations and manual user 
overriding."
-- 
Leif H Silli