How to find out if string has already been URL encoded?

JavaUtf 8Url Encoding

Java Problem Overview


How could I check if string has already been encoded?

For example, if I encode TEST==, I get TEST%3D%3D. If I again encode last string, I get TEST%253D%253D, I would have to know before doing that if it is already encoded...

I have encoded parameters saved, and I need to search for them. I don't know for input parameters, what will they be - encoded or not, so I have to know if I have to encode or decode them before search.

Java Solutions


Solution 1 - Java

Decode, compare to original. If it does differ, original is encoded. If it doesn't differ, original isn't encoded. But still it says nothing about whether the newly decoded version isn't still encoded. A good task for recursion.

I hope one can't write a quine in urlencode, or this algorithm would get stuck.

Exception: When a string contains "+" character url decoder replaces it with a space even though the string is not url encoded

Solution 2 - Java

Use regexp to check if your string contains illegal characters (i.e. characters which cannot be found in URL-encoded string, like whitespace).

Solution 3 - Java

Try decoding the url. If the resulting string is shorter than the original then the original URL was already encoded, else you can safely encode it (either it is not encoded, or even post encoding the url stays as is, so encoding again will not result in a wrong url). Below is sample pseudo (inspired by ruby) code:

# Returns encoded URL for any given URL after determining whether it is already encoded or not
    def escape(url)
      unescaped_url = URI.unescape(url)
      if (unescaped_url.length < url.length)
        return url
      else
        return URI.escape(url)
      end
    end

Solution 4 - Java

You can't know for sure, unless your strings conform to a certain pattern, or you keep track of your strings. As you noted by yourself, a String that is encoded can also be encoded, so you can't be 100% sure by looking at the string itself.

Solution 5 - Java

Check your URL for suspicious characters[1]. List of candidates:

WHITE_SPACE ,", < , > , { , } , | , \ , ^ , ~ , [ , ] , . and `

I use:

private static boolean isAlreadyEncoded(String passedUrl) {
        boolean isEncoded = true;
        if (passedUrl.matches(".*[\\ \"\\<\\>\\{\\}|\\\\^~\\[\\]].*")) {
                isEncoded = false;
        }
        return isEncoded;
}

For the actual encoding I proceed with:

https://stackoverflow.com/a/49796882/1485527

Note: Even if your URL doesn't contain unsafe characters you might want to apply, e.g. Punnycode encoding to the host name. So there is still much space for additional checks.


[1] A list of candidates can be found in the section "unsafe" of the URL spec at Page 2. In my understanding '%' or '#' should be left out in the encoding check, since these characters can occur in encoded URLs as well.

Solution 6 - Java

Using Spring UriComponentsBuilder:

import java.net.URI;
import org.springframework.web.util.UriComponentsBuilder;

private URI getProperlyEncodedUri(String uriString) {
    try {
        return URI.create(uriString);
    } catch (IllegalArgumentException e) {
        return UriComponentsBuilder.fromUriString(uriString).build().toUri();
    }
}

Solution 7 - Java

If you want to be sure that string is encoded correctly (if it needs to be encoded) - just decode and encode it once again.

metacode:

100%_correctly_encoded_string = encode(decode(input_string))

already encoded string will remain untouched. Unencoded string will be encoded. String with only url-allowed characters will remain untouched too.

Solution 8 - Java

According to the spec (https://www.rfc-editor.org/rfc/rfc3986) all URLs MUST start with a scheme followed by a :

Since colons are required as the delimiter between a scheme and the rest of the URI, any string that contains a colon is not encoded.

(This assumes you will not be given an incomplete URI with no scheme.)

So you can test if the string contains a colon, if not, urldecode it, and if that string contains a colon, the original string was url encoded, if not, check if the strings are different and if so, urldecode again and if not, it is not a valid URI.

You can make this loop simpler if you know what schemes you can expect.

Solution 9 - Java

Thanks to this answer I coded a function (JS Language) that encodes the URL just once with encodeURI so you can call it to make sure is encoded just once and you don't need to know if the URL is already encoded.

ES6:

var getUrlEncoded = sURL => {
    if (decodeURI(sURL) === sURL) return encodeURI(sURL)
    return getUrlEncoded(decodeURI(sURL))
}

Pre ES6:

var getUrlEncoded = function(sURL) {
    if (decodeURI(sURL) === sURL) return encodeURI(sURL)
    return getUrlEncoded(decodeURI(sURL))
}

Here are some tests so you can see the URL is only encoded once:

getUrlEncoded("https://example.com/media/Screenshot27 UI Home.jpg")
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"
getUrlEncoded(encodeURI("https://example.com/media/Screenshot27 UI Home.jpg"))
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"
getUrlEncoded(encodeURI(encodeURI("https://example.com/media/Screenshot27 UI Home.jpg")))
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"
getUrlEncoded(decodeURI("https://example.com/media/Screenshot27 UI Home.jpg"))
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"
getUrlEncoded(decodeURI(decodeURI("https://example.com/media/Screenshot27 UI Home.jpg")))
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTrickView Question on Stackoverflow
Solution 1 - JavaSF.View Answer on Stackoverflow
Solution 2 - JavaRomanView Answer on Stackoverflow
Solution 3 - Javaamit_saxenaView Answer on Stackoverflow
Solution 4 - JavaflybywireView Answer on Stackoverflow
Solution 5 - JavajschnasseView Answer on Stackoverflow
Solution 6 - Javasubject47View Answer on Stackoverflow
Solution 7 - JavaesergionView Answer on Stackoverflow
Solution 8 - JavaLuke MlsnaView Answer on Stackoverflow
Solution 9 - JavaAlbertoView Answer on Stackoverflow