How to download an entire directory and subdirectories using wget?

RegexLinuxBashWget

Regex Problem Overview


I am trying to download the files for a project using wget, as the SVN server for that project isn't running anymore and I am only able to access the files through a browser. The base URLs for all the files is the same like

> http://abc.tamu.edu/projects/tzivi/repository/revisions/2/raw/tzivi/*

How can I use wget (or any other similar tool) to download all the files in this repository, where the "tzivi" folder is the root folder and there are several files and sub-folders (upto 2 or 3 levels) under it?

Regex Solutions


Solution 1 - Regex

You may use this in shell:

wget -r --no-parent http://abc.tamu.edu/projects/tzivi/repository/revisions/2/raw/tzivi/

The Parameters are:

-r     //recursive Download

and

--no-parent // Don´t download something from the parent directory

If you don't want to download the entire content, you may use:

-l1 just download the directory (tzivi in your case)

-l2 download the directory and all level 1 subfolders ('tzivi/something' but not 'tivizi/somthing/foo')  

And so on. If you insert no -l option, wget will use -l 5 automatically.

If you insert a -l 0 you´ll download the whole Internet, because wget will follow every link it finds.

Solution 2 - Regex

You can use this in a shell:

wget -r -nH --cut-dirs=7 --reject="index.html*" \
      http://abc.tamu.edu/projects/tzivi/repository/revisions/2/raw/tzivi/

The Parameters are:

-r recursively download

-nH (--no-host-directories) cuts out hostname 

--cut-dirs=X (cuts out X directories)

Solution 3 - Regex

This link just gave me the best answer:

$ wget --no-clobber --convert-links --random-wait -r -p --level 1 -E -e robots=off -U mozilla http://base.site/dir/

Worked like a charm.

Solution 4 - Regex

wget -r --no-parent URL --user=username --password=password

the last two options are optional if you have the username and password for downloading, otherwise no need to use them.

You can also see more options in the link https://www.howtogeek.com/281663/how-to-use-wget-the-ultimate-command-line-downloading-tool/

Solution 5 - Regex

use the command

wget -m www.ilanni.com/nexus/content/

Solution 6 - Regex

you can also use this command :

wget --mirror -pc --convert-links -P ./your-local-dir/ http://www.your-website.com

so that you get the exact mirror of the website you want to download

Solution 7 - Regex

try this working code (30-08-2021):

!wget --no-clobber --convert-links --random-wait -r -p --level 1 -E -e robots=off --adjust-extension -U mozilla "yourweb directory with in quotations"

Solution 8 - Regex

This works:

wget -m -np -c --no-check-certificate -R "index.html*" "https://the-eye.eu/public/AudioBooks/Edgar%20Allan%20Poe%20-%2"

Solution 9 - Regex

This will help

wget -m -np -c --level 0 --no-check-certificate -R"index.html*"http://www.your-websitepage.com/dir

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questioncode4funView Question on Stackoverflow
Solution 1 - Regexuser2936450View Answer on Stackoverflow
Solution 2 - RegexRajiv YadavView Answer on Stackoverflow
Solution 3 - RegexNelinton MedeirosView Answer on Stackoverflow
Solution 4 - RegexSarkar_lat_2016View Answer on Stackoverflow
Solution 5 - Regexlanni654321View Answer on Stackoverflow
Solution 6 - Regexbaobab33View Answer on Stackoverflow
Solution 7 - RegexAndroid CseView Answer on Stackoverflow
Solution 8 - RegexHiep LuongView Answer on Stackoverflow
Solution 9 - Regexgti3993View Answer on Stackoverflow