In a URL, should spaces be encoded using %20 or +?

UrlUrlencodeUrl Encoding

Url Problem Overview


In a URL, should I encode the spaces using %20 or +? For example, in the following example, which one is correct?

www.mydomain.com?type=xbox%20360
www.mydomain.com?type=xbox+360

Our company is leaning to the former, but using the Java method URLEncoder.encode(String, String) with "xbox 360" (and "UTF-8") returns the latter.

So, what's the difference?

Url Solutions


Solution 1 - Url

Form data (for GET or POST) is usually encoded as application/x-www-form-urlencoded: this specifies + for spaces.

URLs are encoded as RFC 1738 which specifies %20.

In theory I think you should have %20 before the ? and + after:

example.com/foo%20bar?foo+bar

Solution 2 - Url

According to the W3C (and they are the official source on these things), a space character in the query string (and in the query string only) may be encoded as either "%20" or "+". From the section "Query strings" under "Recommendations":

> Within the query string, the plus sign is reserved as shorthand notation for a space. Therefore, real plus signs must be encoded. This method was used to make query URIs easier to pass in systems which did not allow spaces.

According to section 3.4 of RFC2396 which is the official specification on URIs in general, the "query" component is URL-dependent:

> 3.4. Query Component > The query component is a string of information to be interpreted by > the resource. > > query = *uric > > Within a query component, the characters ";", "/", "?", ":", "@", > "&", "=", "+", ",", and "$" are reserved.

It is therefore a bug in the other software if it does not accept URLs with spaces in the query string encoded as "+" characters.

As for the third part of your question, one way (though slightly ugly) to fix the output from URLEncoder.encode() is to then call replaceAll("\\+","%20") on the return value.

Solution 3 - Url

This confusion is because URL is still 'broken' to this day

> Take "http://www.google.com" for instance. This is a URL. A URL > is a Uniform Resource Locator and is really a pointer to a web page > (in most cases). URLs actually have a very well-defined structure > since the first specification in 1994. > > We can extract detailed information about the "http://www.google.com" > URL: >

+---------------+-------------------+   
|      Part     |      Data         |   
+---------------+-------------------+   
|  Scheme       | http              |   
|  Host address | www.google.com    |   
+---------------+-------------------+  

> If we look at a more > complex URL such as > "https://bob:[email protected]:8080/file;p=1?q=2#third" we can > extract the following information:

+-------------------+---------------------+
|        Part       |       Data          |
+-------------------+---------------------+
|  Scheme           | https               |
|  User             | bob                 |
|  Password         | bobby               |
|  Host address     | www.lunatech.com    |
|  Port             | 8080                |
|  Path             | /file               |
|  Path parameters  | p=1                 |
|  Query parameters | q=2                 |
|  Fragment         | third               |
+-------------------+---------------------+

> > The reserved characters are different for each part > > For HTTP URLs, a space in a path fragment part has to be encoded to > "%20" (not, absolutely not "+"), while the "+" character in the path > fragment part can be left unencoded. > > Now in the query part, spaces may be encoded to either "+" (for > backwards compatibility: do not try to search for it in the URI > standard) or "%20" while the "+" character (as a result of this > ambiguity) has to be escaped to "%2B". > > This means that the "blue+light blue" string has to be encoded > differently in the path and query parts: > "http://example.com/blue+light%20blue?blue%2Blight+blue";. From there > you can deduce that encoding a fully constructed URL is impossible > without a syntactical awareness of the URL structure.

What this boils down to is

you should have %20 before the ? and + after

Source

Solution 4 - Url

It shouldn't matter, any more than if you encoded the letter A as %41.

However, if you're dealing with a system that doesn't recognize one form, it seems like you're just going to have to give it what it expects regardless of what the "spec" says.

Solution 5 - Url

You can use either - which means most people opt for "+" as it's more human readable.

Solution 6 - Url

When encoding query values, either form, plus or percent-20, is valid; however, since the bandwidth of the internet isn't infinite, you should use plus, since it's two fewer bytes.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMegaByterView Question on Stackoverflow
Solution 1 - UrlGregView Answer on Stackoverflow
Solution 2 - UrlAdam BatkinView Answer on Stackoverflow
Solution 3 - UrlMatas VaitkeviciusView Answer on Stackoverflow
Solution 4 - UrlGary McGillView Answer on Stackoverflow
Solution 5 - UrlFentonView Answer on Stackoverflow
Solution 6 - UrlBenGoldbergView Answer on Stackoverflow