What characters are valid in a URL?

HtmlUrl

Html Problem Overview


I'm trying to remove the non-URL part of a big string. Most of the regexes I found are like [A-Za-z0-9-_.!~*'()], but there are more things that can a url contain. Like http://127.0.0.1:8080/test?v=123#this for example

So what are the latest characters for a valid URL?

Html Solutions


Solution 1 - Html

All the gory details can be found in the current RFC on the topic: RFC 3986 (Uniform Resource Identifier (URI): Generic Syntax)

Based on this related answer, you are looking at a list that looks like: A-Z, a-z, 0-9, -, ., _, ~, :, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, %, and =. Everything else must be url-encoded. Also, some of these characters can only exist in very specific spots in a URI and outside of those spots must be url-encoded (e.g. % can only be used in conjunction with url encoding as in %20), the RFC has all of these specifics.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionblezView Question on Stackoverflow
Solution 1 - HtmlckittelView Answer on Stackoverflow