Multiple simultaneous downloads using Wget?

DownloadWget

Download Problem Overview


I'm using wget to download website content, but wget downloads the files one by one.

How can I make wget download using 4 simultaneous connections?

Download Solutions


Solution 1 - Download

use the aria2 :

aria2c -x 16 [url]
#          |
#          |
#          |
#          ----> the number of connections 

http://aria2.sourceforge.net

I love it !!

Solution 2 - Download

Wget does not support multiple socket connections in order to speed up download of files.

I think we can do a bit better than gmarian answer.

The correct way is to use aria2.

aria2c -x 16 -s 16 [url]
#          |    |
#          |    |
#          |    |
#          ---------> the number of connections here

Official documentation:

> -x, --max-connection-per-server=NUM: The maximum number of connections to one server for each download. Possible Values: 1-16 Default: 1

> -s, --split=N: Download a file using N connections. If more than N URIs are given, first N URIs are used and remaining URLs are used for backup. If less than N URIs are given, those URLs are used more than once so that N connections total are made simultaneously. The number of connections to the same host is restricted by the --max-connection-per-server option. See also the --min-split-size option. Possible Values: 1-* Default: 5

Solution 3 - Download

Since GNU parallel was not mentioned yet, let me give another way:

cat url.list | parallel -j 8 wget -O {#}.html {}

Solution 4 - Download

I found (probably) a solution

> In the process of downloading a few thousand log files from one server > to the next I suddenly had the need to do some serious multithreaded > downloading in BSD, preferably with Wget as that was the simplest way > I could think of handling this. A little looking around led me to > this little nugget: > wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url]

> Just repeat the wget -r -np -N [url] for as many threads as you need... > Now given this isn’t pretty and there are surely better ways to do > this but if you want something quick and dirty it should do the trick...

Note: the option -N makes wget download only "newer" files, which means it won't overwrite or re-download files unless their timestamp changes on the server.

Solution 5 - Download

Another program that can do this is axel.

axel -n <NUMBER_OF_CONNECTIONS> URL

For baisic HTTP Auth,

axel -n <NUMBER_OF_CONNECTIONS> "user:password@https://domain.tld/path/file.ext"

Ubuntu man page.

Solution 6 - Download

A new (but yet not released) tool is [Mget][1]. It has already many options known from Wget and comes with a library that allows you to easily embed (recursive) downloading into your own application.

To answer your question:

mget --num-threads=4 [url]

UPDATE

Mget is now developed as [Wget2][2] with many bugs fixed and more features (e.g. HTTP/2 support).

--num-threads is now --max-threads.

[1]: https://github.com/rockdaboot/mget "Mget" [2]: https://gitlab.com/gnuwget/wget2 "Wget2"

Solution 7 - Download

I strongly suggest to use httrack.

ex: httrack -v -w http://example.com/

It will do a mirror with 8 simultaneous connections as default. Httrack has a tons of options where to play. Have a look.

Solution 8 - Download

As other posters have mentioned, I'd suggest you have a look at aria2. From the Ubuntu man page for version 1.16.1: > aria2 is a utility for downloading files. The supported protocols are HTTP(S), FTP, BitTorrent, and Metalink. aria2 can download a file from multiple sources/protocols and tries to utilize your maximum download bandwidth. It supports downloading a file from HTTP(S)/FTP and BitTorrent at the same time, while the data downloaded from HTTP(S)/FTP is uploaded to the BitTorrent swarm. Using Metalink's chunk checksums, aria2 automatically validates chunks of data while downloading a file like BitTorrent.

You can use the -x flag to specify the maximum number of connections per server (default: 1):

aria2c -x 16 [url] 

If the same file is available from multiple locations, you can choose to download from all of them. Use the -j flag to specify the maximum number of parallel downloads for every static URI (default: 5).

aria2c -j 5 [url] [url2]

Have a look at http://aria2.sourceforge.net/ for more information. For usage information, the man page is really descriptive and has a section on the bottom with usage examples. An online version can be found at http://aria2.sourceforge.net/manual/en/html/README.html.

Solution 9 - Download

wget cant download in multiple connections, instead you can try to user other program like aria2.

Solution 10 - Download

use

aria2c -x 10 -i websites.txt >/dev/null 2>/dev/null &

in websites.txt put 1 url per line, example:

https://www.example.com/1.mp4
https://www.example.com/2.mp4
https://www.example.com/3.mp4
https://www.example.com/4.mp4
https://www.example.com/5.mp4

Solution 11 - Download

try pcurl

http://sourceforge.net/projects/pcurl/

uses curl instead of wget, downloads in 10 segments in parallel.

Solution 12 - Download

They always say it depends but when it comes to mirroring a website The best exists httrack. It is super fast and easy to work. The only downside is it's so called support forum but you can find your way using official documentation. It has both GUI and CLI interface and it Supports cookies just read the docs This is the best.(Be cureful with this tool you can download the whole web on your harddrive)

httrack -c8 [url]

By default maximum number of simultaneous connections limited to 8 to avoid server overload

Solution 13 - Download

use xargs to make wget working in multiple file in parallel

#!/bin/bash

mywget()
{
    wget "$1"
}

export -f mywget

# run wget in parallel using 8 thread/connection
xargs -P 8 -n 1 -I {} bash -c "mywget '{}'" < list_urls.txt

Aria2 options, The right way working with file smaller than 20mb

aria2c -k 2M -x 10 -s 10 [url]

-k 2M split file into 2mb chunk

-k or --min-split-size has default value of 20mb, if you not set this option and file under 20mb it will only run in single connection no matter what value of -x or -s

Solution 14 - Download

You can use xargs

-P is the number of processes, for example, if set -P 4, four links will be downloaded at the same time, if set to -P 0, xargs will launch as many processes as possible and all of the links will be downloaded.

cat links.txt | xargs -P 4 -I{} wget {}

Solution 15 - Download

I'm using gnu parallel

cat listoflinks.txt | parallel --bar -j ${MAX_PARALLEL:-$(nproc)} wget -nv {}
  1. cat will pipe a list of line separated URLs to parallel
  2. --bar flag will show parallel execution progress bar
  3. MAX_PARALLEL env var is for maximum no of parallel download, use it carefully, default here is current no of CPUs

> tip: use --dry-run to see what will happen if you execute command.
> cat listoflinks.txt | parallel --dry-run --bar -j ${MAX_PARALLEL} wget -nv {}

Solution 16 - Download

make can be parallelised easily (e.g., make -j 4). For example, here's a simple Makefile I'm using to download files in parallel using wget:

BASE=http://www.somewhere.com/path/to
FILES=$(shell awk '{printf "%s.ext\n", $$1}' filelist.txt)
LOG=download.log

all: $(FILES)
	echo $(FILES)

%.ext:
	wget -N -a $(LOG) $(BASE)/$@

.PHONY: all
default: all

Solution 17 - Download

Consider using Regular Expressions or FTP Globbing. By that you could start wget multiple times with different groups of filename starting characters depending on their frequency of occurrence.

This is for example how I sync a folder between two NAS:

wget --recursive --level 0 --no-host-directories --cut-dirs=2 --no-verbose --timestamping --backups=0 --bind-address=10.0.0.10 --user=<ftp_user> --password=<ftp_password> "ftp://10.0.0.100/foo/bar/[0-9a-hA-H]*" --directory-prefix=/volume1/foo &
wget --recursive --level 0 --no-host-directories --cut-dirs=2 --no-verbose --timestamping --backups=0 --bind-address=10.0.0.11 --user=<ftp_user> --password=<ftp_password> "ftp://10.0.0.100/foo/bar/[!0-9a-hA-H]*" --directory-prefix=/volume1/foo &

The first wget syncs all files/folders starting with 0, 1, 2... F, G, H and the second thread syncs everything else.

This was the easiest way to sync between a NAS with one 10G ethernet port (10.0.0.100) and a NAS with two 1G ethernet ports (10.0.0.10 and 10.0.0.11). I bound the two wget threads through --bind-address to the different ethernet ports and called them parallel by putting & at the end of each line. By that I was able to copy huge files with 2x 100 MB/s = 200 MB/s in total.

Solution 18 - Download

Call Wget for each link and set it to run in background.

I tried this Python code

with open('links.txt', 'r')as f1:      # Opens links.txt file with read mode
  list_1 = f1.read().splitlines()      # Get every line in links.txt
for i in list_1:                       # Iteration over each link
  !wget "$i" -bq                       # Call wget with background mode

Parameters :

      b - Run in Background
      q - Quiet mode (No Output)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionjuboView Question on Stackoverflow
Solution 1 - DownloadgmarianView Answer on Stackoverflow
Solution 2 - Downloadthomas.hanView Answer on Stackoverflow
Solution 3 - DownloadNikolay ShmyrevView Answer on Stackoverflow
Solution 4 - DownloadSMUsamaShahView Answer on Stackoverflow
Solution 5 - DownloadLord Loh.View Answer on Stackoverflow
Solution 6 - DownloadrockdabootView Answer on Stackoverflow
Solution 7 - DownloadRodrigo Bustos L.View Answer on Stackoverflow
Solution 8 - DownloadrunejuhlView Answer on Stackoverflow
Solution 9 - Downloaduser181677View Answer on Stackoverflow
Solution 10 - DownloadDavid CorpView Answer on Stackoverflow
Solution 11 - DownloadRumbleView Answer on Stackoverflow
Solution 12 - DownloadpouyaView Answer on Stackoverflow
Solution 13 - DownloadewwinkView Answer on Stackoverflow
Solution 14 - DownloadmirhosseinView Answer on Stackoverflow
Solution 15 - DownloadPratik BalarView Answer on Stackoverflow
Solution 16 - DownloadPaul PriceView Answer on Stackoverflow
Solution 17 - DownloadmguttView Answer on Stackoverflow
Solution 18 - DownloadEverest OkView Answer on Stackoverflow