URLEncoder not able to translate space character

JavaUrlUrlencode

Java Problem Overview


I am expecting

System.out.println(java.net.URLEncoder.encode("Hello World", "UTF-8"));

to output:

Hello%20World

(20 is ASCII Hex code for space)

However, what I get is:

Hello+World

Am I using the wrong method? What is the correct method I should be using?

Java Solutions


Solution 1 - Java

This behaves as expected. The URLEncoder implements the HTML Specifications for how to encode URLs in HTML forms.

From the javadocs:

> This class contains static methods for > converting a String to the > application/x-www-form-urlencoded MIME > format.

and from the HTML Specification:

> application/x-www-form-urlencoded
> > Forms submitted with this content type > must be encoded as follows: > > 1. Control names and values are escaped. Space characters are replaced > by `+'

You will have to replace it, e.g.:

System.out.println(java.net.URLEncoder.encode("Hello World", "UTF-8").replace("+", "%20"));

Solution 2 - Java

A space is encoded to %20 in URLs, and to + in forms submitted data (content type application/x-www-form-urlencoded). You need the former.

Using Guava:

dependencies {
     compile 'com.google.guava:guava:23.0'
     // or, for Android:
     compile 'com.google.guava:guava:23.0-android'
}

You can use UrlEscapers:

String encodedString = UrlEscapers.urlFragmentEscaper().escape(inputString);

Don't use String.replace, this would only encode the space. Use a library instead.

Solution 3 - Java

This class perform application/x-www-form-urlencoded-type encoding rather than percent encoding, therefore replacing with + is a correct behaviour.

From javadoc:

> When encoding a String, the following rules apply: >

  • The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
  • The special characters ".", "-", "*", and "_" remain the same.
  • The space character " " is converted into a plus sign "+".
  • All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.

Solution 4 - Java

Encode Query params

org.apache.commons.httpclient.util.URIUtil
    URIUtil.encodeQuery(input);

OR if you want to escape chars within URI

public static String escapeURIPathParam(String input) {
  StringBuilder resultStr = new StringBuilder();
  for (char ch : input.toCharArray()) {
   if (isUnsafe(ch)) {
    resultStr.append('%');
    resultStr.append(toHex(ch / 16));
    resultStr.append(toHex(ch % 16));
   } else{
    resultStr.append(ch);
   }
  }
  return resultStr.toString();
 }

 private static char toHex(int ch) {
  return (char) (ch < 10 ? '0' + ch : 'A' + ch - 10);
 }

 private static boolean isUnsafe(char ch) {
  if (ch > 128 || ch < 0)
   return true;
  return " %$&+,/:;=?@<>#%".indexOf(ch) >= 0;
 }

Solution 5 - Java

Hello+World is how a browser will encode form data (application/x-www-form-urlencoded) for a GET request and this is the generally accepted form for the query part of a URI.

http://host/path/?message=Hello+World

If you sent this request to a Java servlet, the servlet would correctly decode the parameter value. Usually the only time there are issues here is if the encoding doesn't match.

Strictly speaking, there is no requirement in the HTTP or URI specs that the query part to be encoded using application/x-www-form-urlencoded key-value pairs; the query part just needs to be in the form the web server accepts. In practice, this is unlikely to be an issue.

It would generally be incorrect to use this encoding for other parts of the URI (the path for example). In that case, you should use the encoding scheme as described in RFC 3986.

http://host/Hello%20World

More here.

Solution 6 - Java

Just been struggling with this too on Android, managed to stumble upon Uri.encode(String, String) while specific to android (android.net.Uri) might be useful to some.

static String encode(String s, String allow)

https://developer.android.com/reference/android/net/Uri.html#encode(java.lang.String, java.lang.String)

Solution 7 - Java

The other answers either present a manual string replacement, URLEncoder which actually encodes for HTML format, Apache's abandoned URIUtil, or using Guava's UrlEscapers. The last one is fine, except it doesn't provide a decoder.

Apache Commons Lang provides the URLCodec, which encodes and decodes according to URL format rfc3986.

String encoded = new URLCodec().encode(str);
String decoded = new URLCodec().decode(str);

If you are already using Spring, you can also opt to use its UriUtils class as well.

Solution 8 - Java

If you are using jetty then org.eclipse.jetty.util.URIUtil will solve the issue.

String encoded_string = URIUtil.encodePath(not_encoded_string).toString();

Solution 9 - Java

If you want to encode URI path components, you can also use standard JDK functions, e.g.

public static String encodeURLPathComponent(String path) {
	try {
		return new URI(null, null, path, null).toASCIIString();
	} catch (URISyntaxException e) {
		// do some error handling
	}
	return "";
}

The URI class can also be used to encode different parts of or whole URIs.

Solution 10 - Java

"+" is correct. If you really need %20, then replace the Plusses yourself afterwards.

Warning: This answer is heavily disputed (+8 vs. -6), so take this with a grain of salt.

Solution 11 - Java

This worked for me

org.apache.catalina.util.URLEncoder ul = new org.apache.catalina.util.URLEncoder().encode("MY URL");
			

Solution 12 - Java

Although quite old, nevertheless a quick response:

Spring provides UriUtils - with this you can specify how to encoded and which part is it related from an URI, e.g.

encodePathSegment
encodePort
encodeFragment
encodeUriVariables
....

I use them cause we already using Spring, i.e. no additonal library is required!

Solution 13 - Java

It's not one-liner, but you can use:

URL url = new URL("https://some-host.net/dav/files/selling_Rosetta Stone Case Study.png.aes");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
System.out.println(uri.toString());

This will give you an output:

https://some-host.net/dav/files/selling_Rosetta%20Stone%20Case%20Study.png.aes

Solution 14 - Java

I was already using Feign so UriUtils was available to me but Spring UrlUtils was not.

<!-- https://mvnrepository.com/artifact/io.github.openfeign/feign-core -->
<dependency>
    <groupId>io.github.openfeign</groupId>
    <artifactId>feign-core</artifactId>
    <version>11.8</version>
</dependency>

My Feign test code:

import feign.template.UriUtils;

System.out.println(UriUtils.encode("Hello World"));

Outputs:

Hello%20World

As the class suggests, it encodes URIs and not URLs but the OP asked about URIs and not URLs.

System.out.println(UriUtils.encode("https://some-host.net/dav/files/selling_Rosetta Stone Case Study.png.aes"));

Outputs:

https%3A%2F%2Fsome-host.net%2Fdav%2Ffiles%2Fselling_Rosetta%20Stone%20Case%20Study.png.aes

Solution 15 - Java

Try below approach:

Add a new dependency

<!-- https://mvnrepository.com/artifact/org.apache.tomcat/tomcat-catalina -->
<dependency>
	<groupId>org.apache.tomcat</groupId>
	<artifactId>tomcat-catalina</artifactId>
	<version>10.0.13</version>
</dependency>

Now do as follows:

String str = "Hello+World"; // For "Hello World", decoder is not required
// import java.net.URLDecoder;
String newURL = URLDecoder.decode(str, StandardCharsets.UTF_8);
// import org.apache.catalina.util.URLEncoder;
System.out.println(URLEncoder.DEFAULT.encode(newURL, StandardCharsets.UTF_8));

You'll get the output as:

Hello%20World

Solution 16 - Java

Check out the java.net.URI class.

Solution 17 - Java

USE MyUrlEncode.URLencoding(String url , String enc) to handle the problem

    public class MyUrlEncode {
	static BitSet dontNeedEncoding = null;
	static final int caseDiff = ('a' - 'A');
	static {
		dontNeedEncoding = new BitSet(256);
		int i;
		for (i = 'a'; i <= 'z'; i++) {
		    dontNeedEncoding.set(i);
		}
		for (i = 'A'; i <= 'Z'; i++) {
		    dontNeedEncoding.set(i);
		}
		for (i = '0'; i <= '9'; i++) {
		    dontNeedEncoding.set(i);
		}
		dontNeedEncoding.set('-');
		dontNeedEncoding.set('_');
		dontNeedEncoding.set('.');
		dontNeedEncoding.set('*');
		dontNeedEncoding.set('&');
		dontNeedEncoding.set('=');
	}
	public static String char2Unicode(char c) {
		if(dontNeedEncoding.get(c)) {
			return String.valueOf(c);
		}
		StringBuffer resultBuffer = new StringBuffer();
		resultBuffer.append("%");
		char ch = Character.forDigit((c >> 4) & 0xF, 16);
		    if (Character.isLetter(ch)) {
			ch -= caseDiff;
	    }
	    resultBuffer.append(ch);
		    ch = Character.forDigit(c & 0xF, 16);
		    if (Character.isLetter(ch)) {
			ch -= caseDiff;
	    }
		 resultBuffer.append(ch);
		return resultBuffer.toString();
	}
	private static String URLEncoding(String url,String enc) throws UnsupportedEncodingException {
		StringBuffer stringBuffer = new StringBuffer();
		if(!dontNeedEncoding.get('/')) {
			dontNeedEncoding.set('/');
		}
		if(!dontNeedEncoding.get(':')) {
			dontNeedEncoding.set(':');
		}
		byte [] buff = url.getBytes(enc);
		for (int i = 0; i < buff.length; i++) {
			stringBuffer.append(char2Unicode((char)buff[i]));
		}
		return stringBuffer.toString();
	}
	private static String URIEncoding(String uri , String enc) throws UnsupportedEncodingException { //对请求参数进行编码
		StringBuffer stringBuffer = new StringBuffer();
		if(dontNeedEncoding.get('/')) {
			dontNeedEncoding.clear('/');
		}
		if(dontNeedEncoding.get(':')) {
			dontNeedEncoding.clear(':');
		}
		byte [] buff = uri.getBytes(enc);
		for (int i = 0; i < buff.length; i++) {
			stringBuffer.append(char2Unicode((char)buff[i]));
		}
		return stringBuffer.toString();
	}
	
	public static String URLencoding(String url , String enc) throws UnsupportedEncodingException {
		int index = url.indexOf('?');
		StringBuffer result = new StringBuffer();
		if(index == -1) {
			result.append(URLEncoding(url, enc));
		}else {
			result.append(URLEncoding(url.substring(0 , index),enc));
			result.append("?");
			result.append(URIEncoding(url.substring(index+1),enc));
		}
		return result.toString();
	}

}

Solution 18 - Java

> Am I using the wrong method? What is the correct method I should be using?

Yes, this method java.net.URLEncoder.encode wasn't made for converting " " to "20%" according to spec (source).

> The space character " " is converted into a plus sign "+".

Even this is not the correct method, you can modify this to: System.out.println(java.net.URLEncoder.encode("Hello World", "UTF-8").replaceAll("\\+", "%20"));have a nice day =).

Solution 19 - Java

use character-set "ISO-8859-1" for URLEncoder

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionCheok Yan ChengView Question on Stackoverflow
Solution 1 - JavadogbaneView Answer on Stackoverflow
Solution 2 - JavapybView Answer on Stackoverflow
Solution 3 - JavaaxtavtView Answer on Stackoverflow
Solution 4 - JavafmucarView Answer on Stackoverflow
Solution 5 - JavaMcDowellView Answer on Stackoverflow
Solution 6 - JavaChrispixView Answer on Stackoverflow
Solution 7 - JavaBenny BottemaView Answer on Stackoverflow
Solution 8 - Javagourab ghoshView Answer on Stackoverflow
Solution 9 - JavaMrTuxView Answer on Stackoverflow
Solution 10 - JavaDanielView Answer on Stackoverflow
Solution 11 - JavaHitesh KumarView Answer on Stackoverflow
Solution 12 - JavaLeOView Answer on Stackoverflow
Solution 13 - JavatchudykView Answer on Stackoverflow
Solution 14 - JavarjdkolbView Answer on Stackoverflow
Solution 15 - JavaChandan KumarView Answer on Stackoverflow
Solution 16 - JavaFredrik WiderbergView Answer on Stackoverflow
Solution 17 - JavaIloveIniestaView Answer on Stackoverflow
Solution 18 - JavaPreguntonView Answer on Stackoverflow
Solution 19 - JavaAkhil SikriView Answer on Stackoverflow