What's a "canonical path"?
PathFilepathRelative PathAbsolute PathPath Problem Overview
So, an absolute path is a way to get to a certain file or location describing the full route to it, the full path, and it's OS dependent (the absolute paths for Windows and Linux, for example, are different). A relative path, on the other hand, is a route to a file or location which is described from the current location ..
(two dots) indicating a superior level in the directories tree. That has been clear to me for several years now.
When searching I've even seen that there are canonicalized files too! All I know is that CANONICAL means something like "according to the rules" or something.
Can somebody enlighten me in therms of theory about canonical stuff?
Path Solutions
Solution 1 - Path
The whole point of making anything "canonical" is so that you can compare two things. For example, both ../../here/bar/x
and ./test/../../bar/x
may refer to the same location, but you can't do a textual comparison on the two paths. However, if you turn them into their canonical representation, they both become ../bar/x
, and we see that they actually refer to the same thing.
In short, it is often the case that you have many ways of referring to one thing, and in that case you may be able to define a canonical representation which is unique and which allows you to get a handle on collections of such things.
(If you're looking for more examples, all of mathematics is full of "canonical" constructions for all sorts of objects, and very much with the same purpose in mind. Maybe this Wikipedia article can provide some additional directions.)
Solution 2 - Path
A good way to define a canonical path will be: the shortest absolute path
(short, in the meaning of string-length).
This is an example of the difference between an absolute path and a canonical path:
absolute path: C:\abc\..\abc\file.txt
canonical path: C:\abc\file.txt
Solution 3 - Path
What a canonical path is (or its difference from an absolute path) is system dependent.
Typically if a (full) path contains aliases, shortcuts or symbolic links the canonical path resolves all these into the actual directories they refer.
Example: if /bin/a
is a sym link, you can find it anywhere you request for an absolute path e.g. from java.io.File#getAbsolutePath while the real file (i.e. the actual target of the link) i.e. usr/local/bin/a
would be return as a canonical path e.g. from java.io.File#getCanonicalPath
Solution 4 - Path
The most issues with canonical paths occur when you are passing the name of a dir and not file. For file, if we are providing absolute path that is also the canonical path. But for dir it means omitting the last "/". For example, "/var/tmp/foo" is a canonical path while "/var/tmp/foo/" is not.
Solution 5 - Path
A good definition of a canonical path is given in the documentation of readlink
in GNU Coreutils. It is specified that 'Canonicalize mode' returns an equivalent path that doesn't have any of these things:
- hard links to self (.) and parent (..) directories
- repeated separators (/)
- symbolic links
The string length is irrelevant, as is demonstrated in the following example.
You can experiment with readlink -f
(canonicalize mode) or its preferred equivalent command realpath
to see the difference between an 'absolute path' and a 'canonical absolute path' for some programs on your system if you are running linux or are using GNU Coreutils.
I can get the path of 'java' on my system using which
$ which java
/usr/bin/java
This path, however, is actually a symbolic link to another symbolic link. This symbolic link chain can be displayed using namei
.
$ namei $(which java)
f: /usr/bin/java
d /
d usr
d bin
l java -> /etc/alternatives/java
d /
d etc
d alternatives
l java -> /usr/lib/jvm/java-17-openjdk-amd64/bin/java
d /
d usr
d lib
d jvm
d java-17-openjdk-amd64
d bin
- java
The canonical path can be found using the previously mentioned realpath
command.
$ realpath $(which java)
/usr/lib/jvm/java-17-openjdk-amd64/bin/java