Is a colon `:` safe for friendly-URL use?

UrlGwtSpecial CharactersFriendly Url

Url Problem Overview


We are designing a URL system that will specify application sections as words separated by slashes. Specifically, this is in GWT, so the relevant parts of the URL will be in the hash (which will be interpreted by a controller layer on the client-side):

http://site/gwturl#section1/section2

Some sections may need additional attributes, which we'd like to specify with a :, so that the section parts of the URL are unambiguous. The code would split first on /, then on :, like this:

http://site/gwturl#user:45/comments

Of course, we are doing this for url-friendliness, so we'd like to make sure that none of these characters which will hold special meaning will be url-encoded by browsers, or any other system, and end up with a url like this:

http://site/gwturl#user%3A45/comments <--- BAD

Is using the colon in this way safe (by which I mean won't be automatically encoded) for browsers, bookmarking systems, even Javascript or Java code?

Url Solutions


Solution 1 - Url

I recently wrote a URL encoder, so this is pretty fresh in my mind.

> http://site/gwturl#user:45/comments

All the characters in the fragment part (user:45/comments) are perfectly legal for RFC 3986 URIs.

The relevant parts of the ABNF:

fragment      = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

Apart from these restrictions, the fragment part has no defined structure beyond the one your application gives it. The scheme, http, only says that you don't send this part to the server.


EDIT:

D'oh!

Despite my assertions about the URI spec, irreputable provides the correct answer when he points out that the HTML 4 spec restricts element names/identifiers.

Note that identifier rules are changing in HTML 5. URI restrictions will still apply (at time of writing, there are some unresolved issues around HTML 5's use of URIs).

Solution 2 - Url

MediaWiki and other wiki engines use colons in their URLs to designate namespaces, with apparently no major problems.

eg http://en.wikipedia.org/wiki/Template:Welcome

Solution 3 - Url

In addition to McDowell's analysis on URI standard, remember also that the fragment must be valid HTML anchor name. According to http://www.w3.org/TR/html4/types.html#type-name

> ID and NAME tokens must begin with a > letter ([A-Za-z]) and may be followed > by any number of letters, digits > ([0-9]), hyphens ("-"), underscores > ("_"), colons (":"), and periods > (".").

So you are in luck. ":" is explicitly allowed. And nobody should "%"-escape it, not only because "%" is illegal char there, but also because fragment must match anchor name char-by-char, therefore no agent should try to tamper with them in any way.

However you have to test it. Web standards are not strictly followed, sometimes the standards are conflicting. For example HTTP/1.1 RFC 2616 does not allow query string in the request URL, while HTML constructs one when submitting a form with GET method. Whichever implemented in the real world wins at the end of the day.

Solution 4 - Url

I wouldn't count on it. It'll likely get url encoded as %3A by many user-agents.

Solution 5 - Url

From URLEncoder javadoc:

> For more information about HTML form > encoding, consult the HTML > specification. > > When encoding a String, the following > rules apply: > > > * The alphanumeric characters "a" > through "z", "A" through "Z" and "0" > through "9" remain the same.
> * The > special characters ".", "-", "*", and > "_" remain the same.
> * The space > character " " is converted into a plus > sign "+". > * All other characters are > unsafe and are first converted into > one or more bytes using some encoding > scheme. Then each byte is represented > by the 3-character string "%xy", where > xy is the two-digit hexadecimal > representation of the byte. The > recommended encoding scheme to use is > UTF-8. However, for compatibility > reasons, if an encoding is not > specified, then the default encoding > of the platform is used.

That is, : is not safe.

Solution 6 - Url

I don't see Firefox or IE8 encoding some of the Wikipedia URLs that include the character.

Solution 7 - Url

Google also uses colons.

In this specification, they use colons for the custom method names.

Solution 8 - Url

Colons are used as the split between username and password if a protocol requires authentication.

Solution 9 - Url

Colon isn't safe. See here

Solution 10 - Url

It is not a safe character and is used to distinguish what port you connect to when it is right after your domain name

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNicoleView Question on Stackoverflow
Solution 1 - UrlMcDowellView Answer on Stackoverflow
Solution 2 - UrlPaul WrayView Answer on Stackoverflow
Solution 3 - UrlirreputableView Answer on Stackoverflow
Solution 4 - UrlAsaphView Answer on Stackoverflow
Solution 5 - UrlaxtavtView Answer on Stackoverflow
Solution 6 - UrlkprobstView Answer on Stackoverflow
Solution 7 - UrlSabfirView Answer on Stackoverflow
Solution 8 - UrlJP SilvashyView Answer on Stackoverflow
Solution 9 - UrlBobView Answer on Stackoverflow
Solution 10 - UrlRHickeView Answer on Stackoverflow