Do I encode ampersands in <a href...>?

Html

Html Problem Overview


I'm writing code that automatically generates HTML, and I want it to encode things properly.

Say I'm generating a link to the following URL:

http://www.google.com/search?rls=en&q=stack+overflow

I'm assuming that all attribute values should be HTML-encoded. (Please correct me if I'm wrong.) So that means if I'm putting the above URL into an anchor tag, I should encode the ampersand as &amp;, like this:

<a href="http://www.google.com/search?rls=en&amp;q=stack+overflow">

Is that correct?

Html Solutions


Solution 1 - Html

Yes, it is. HTML entities are parsed inside HTML attributes, and a stray & would create an ambiguity. That's why you should always write &amp; instead of just & inside all HTML attributes.

That said, only & and quotes need to be encoded. If you have special characters like é in your attribute, you don't need to encode those to satisfy the HTML parser.

It used to be the case that URLs needed special treatment with non-ASCII characters, like é. You had to encode those using percent-escapes, and in this case it would give %C3%A9, because they were defined by RFC 1738. However, RFC 1738 has been superseded by RFC 3986 (URIs, Uniform Resource Identifiers) and RFC 3987 (IRIs, Internationalized Resource Identifiers), on which the WhatWG based its work to define how browsers should behave when they see an URL with non-ASCII characters in it since HTML5. It's therefore now safe to include non-ASCII characters in URLs, percent-encoded or not.

Solution 2 - Html

By current official HTML recommendations, the ampersand must be escaped e.g. as &amp; in contexts like this. However, browsers do not require it, and the HTML5 CR proposes to make this a rule, so that special rules apply in attribute values. Current HTML5 validators are outdated in this respect (see bug report with comments).

It will remain possible to escape ampersands in attribute values, but apart from validation with current tools, there is no practical need to escape them in href values (and there is a small risk of making mistakes if you start escaping them).

Solution 3 - Html

You have two standards concerning URLs in links (<a href).

The first standard is RFC 1866 (HTML 2.0) where in "3.2.1. Data Characters" you can read the characters which need to be escaped when used as the value for an HTML attribute. (Attributes themselves do not allow special characters at all, e.g. <a hr&ef="http://... is not allowed, nor is <a hr&amp;ef="http://....)

Later this has gone into the HTML 4 standard, the characters you need to escape are:

<   to   &lt;
>   to   &gt;
&   to   &amp;
"   to   &quote;
'   to   &apos;

The other standard is RFC 3986 "Generic URI standard", where URLs are handled (this happens when the browser is about to follow a link because the user clicked on the HTML element).

reserved    = gen-delims / sub-delims

gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims  = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

It is important to escape those characters so the client knows whether they represent data or a delimiter.

Example unescaped:

https://example.com/?user=test&password&te&st&goto=https://google.com

Example, a fully legitimate URL

https://example.com/?user=test&password&te%26st&goto=https%3A%2F%2Fgoogle.com

Example fully legitimate URL in the value of an HTML attribute:

https://example.com/?user=test&amp;password&amp;te%26st&amp;goto=https%3A%2F%2Fgoogle.com

Also important scenarios:

  • JavaScript code as a value:

    <img src="..." onclick="window.location.href = &quot;https://example.com/?user=test&amp;password&amp;te%26st&amp;goto=https%3A%2F%2Fgoogle.com&quot;;">...</a> (Yes, ;; is correct.)

  • JSON as a value:

    <a href="..." data-analytics="{&quot;event&quot;: &quot;click&quot;}">...</a>

  • Escaped things inside escaped things, double encoding, URL inside URL inside parameter, etc,...

    http://x.com/?passwordUrl=http%3A%2F%2Fy.com%2F%3Fuser%3Dtest&amp;password=&quot;&quot;123

I am posting a new answer because I find zneak's answer does not have enough examples, does not show HTML and URI handling as different aspects and standards and has some minor things missing.

Solution 4 - Html

Yes, you should convert & to &amp;.

This HTML validator tool by W3C is helpful for questions like this. It will tell you the errors and warnings for a particular page.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJW.View Question on Stackoverflow
Solution 1 - HtmlzneakView Answer on Stackoverflow
Solution 2 - HtmlJukka K. KorpelaView Answer on Stackoverflow
Solution 3 - HtmlDaniel W.View Answer on Stackoverflow
Solution 4 - HtmlRandy GreencornView Answer on Stackoverflow