wget: don't follow redirects

LinuxHttpBashRedirectWget

Linux Problem Overview


How do I prevent wget from following redirects?

Linux Solutions


Solution 1 - Linux

--max-redirect 0

I haven't tried this, it will either allow none or allow infinite..

Solution 2 - Linux

Use curl without -L instead of wget. Omitting that option when using curl prevents the redirect from being followed.

If you use curl -I <URL> then you'll get the headers instead of the redirect HTML.

If you use curl -IL <URL> then you'll get the headers for the URL, plus those for the URL you're redirected to.

Solution 3 - Linux

Some versions of wget have a --max-redirect option: See here

Solution 4 - Linux

wget follows up to 20 redirects by default. However, it does not span hosts. If you have asked wget to download example.com, it will not touch any resources at www.example.com. wget will detect this as a request to span to another host and decide against it.

In short, you should probably be executing:

wget --mirror www.example.com

Rather than

wget --mirror example.com

Now let's say the owner of www.example.com has several subdomains at example.com and we are interested in all of them. How to proceed?

Try this:

wget --mirror --domains=example.com example.com

wget will now visit all subdomains of example.com, including m.example.com and www.example.com.

Solution 5 - Linux

In general, it is not a good idea to depend on a specific number of redirects.

For example, in order to download IntellijIdea, the URL that is promised to always resolve to the latest version of Community Edition for Linux is something like https://download.jetbrains.com/product?code=IIC&latest&distribution=linux, but if you visit that URL nowadays, you are going to be redirected twice (2 times) before you reach the actual downloadable file. In the future you might be redirected three times, or not at all.

The way to solve this problem is with the use of the HTTP HEAD verb. Here is how I solved it in the case of IntellijIdea:

# This is the starting URL.
URL="https://download.jetbrains.com/product?code=IIC&latest&distribution=linux"
echo "URL: $URL"

# Issue HEAD requests until the actual target is found.
# The result contains the target location, among some irrelevant stuff.
LOC=$(wget --no-verbose --method=HEAD --output-file - $URL)
echo "LOC: $LOC"

# Extract the URL from the result, stripping the irrelevant stuff.
URL=$(cut "--delimiter= " --fields=4 <<< "$LOC")
echo "URL: $URL"

# Optional: download the actual file.
wget "$URL"

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionflybywireView Question on Stackoverflow
Solution 1 - LinuxMattView Answer on Stackoverflow
Solution 2 - LinuxDennis WilliamsonView Answer on Stackoverflow
Solution 3 - LinuxPekkaView Answer on Stackoverflow
Solution 4 - LinuxTim McNamaraView Answer on Stackoverflow
Solution 5 - LinuxMike NakisView Answer on Stackoverflow