What character encoding should I use for a HTTP header?

Http Headers

Http Headers Problem Overview


I'm using a "fun" HTML special-character (✰)(see http://html5boilerplate.com/ for more info) for a Server HTTP-header and am wondering if it is "allowed" per spec.

  • Using the Network Tab in the dev tools in Chrome on Windows Xp Pro SP 3 I see the ✰ just fine.

  • In IE8 the ✰ is not rendered correctly.

  • The w3.org HTML validator does not render it correctly (displays "â°" instead).

Now, I'm not too keen on character encodings ... and frankly I don't really care too much about them; I just blindly use UTF-8 cus I'm told to. :-)


Is the disparity caused by bugs in the different parsers/browses/engines/(whatever-they-are-called)?

Is there a spec for this or maybe a list of allowed characters for an HTTP-header "value"?

Http Headers Solutions


Solution 1 - Http Headers

In short: Only ASCII is guaranteed to work. Some non-ASCII bytes are allowed for backwards compatibility, but are not supposed to be displayable.

HTTPbis gave up and specified that in the headers there is no useful encoding besides ASCII:

> Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.


Previously, RFC 2616 from 1999 defined this:

> Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047 [14].

and RFC 2047 is the MIME encoding, so it'd be:

=?UTF-8?Q?=E2=9C=B0?=

but I don't think that many (if any) clients support it.

Solution 2 - Http Headers

Please read comments first, this answer likely draws wrong conclusions from the right sources, needs edit.


You can use any printable ASCII chars, and no special chars like ✰ (Which is not ASCII)

Tip: you can encode anything in JSON.

Edit: may not be obvious at first, the character encoding defined in the header only applies for the response body, not for the header itself. (As it would cause a chicken-&-egg problem.)


I'd like to sum up all the relevant definitions as per the spec linked by Penchant.

message-header = field-name ":" [ field-value ]
field-name     = token
field-value    = *( field-content | LWS )

So, we are after field-value.

LWS            = [CRLF] 1*( SP | HT )
CRLF           = CR LF
CR             = <US-ASCII CR, carriage return (13)>
LF             = <US-ASCII LF, linefeed (10)>
SP             = <US-ASCII SP, space (32)>
HT             = <US-ASCII HT, horizontal-tab (9)>

LWS stands for Linear White Space. Essentially, LWS is Space or Tab, but you can break your field-value into multiple lines by starting a new line before a Space or Tab.

Let's simplify it to this:

field-value    = <any field-content or Space or Tab>

Now we are after field-content.

field-content  = <the OCTETs making up the field-value
                 and consisting of either *TEXT or combinations
                 of token, separators, and quoted-string>
OCTET          = <any 8-bit sequence of data>
TEXT           = <any OCTET except CTLs,
                 but including LWS>
CTL            = <any US-ASCII control character
                 (octets 0 - 31) and DEL (127)>
token          = 1*<any CHAR except CTLs or separators>
separators     = "(" | ")" | "<" | ">" | "@"
                 | "," | ";" | ":" | "\" | <">
                 | "/" | "[" | "]" | "?" | "="
                 | "{" | "}" | SP | HT

TEXT is the most general and includes all the rest -so forget about the rest-. Here is the US-ASCII charset (= ASCII)

As you can see, all printable ASCII chars are allowed.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDavid MurdochView Question on Stackoverflow
Solution 1 - Http HeadersKornelView Answer on Stackoverflow
Solution 2 - Http HeaderszupaView Answer on Stackoverflow