Pipe output of cat to cURL to download a list of files

UnixCurl

Unix Problem Overview


I have a list URLs in a file called urls.txt. Each line contains 1 URL. I want to download all of the files at once using cURL. I can't seem to get the right one-liner down.

I tried:

$ cat urls.txt | xargs -0 curl -O

But that only gives me the last file in the list.

Unix Solutions


Solution 1 - Unix

This works for me:

$ xargs -n 1 curl -O < urls.txt

I'm in FreeBSD. Your xargs may work differently.

Note that this runs sequential curls, which you may view as unnecessarily heavy. If you'd like to save some of that overhead, the following may work in bash:

$ mapfile -t urls < urls.txt
$ curl ${urls[@]/#/-O }

This saves your URL list to an array, then expands the array with options to curl to cause targets to be downloaded. The curl command can take multiple URLs and fetch all of them, recycling the existing connection (HTTP/1.1), but it needs the -O option before each one in order to download and save each target. Note that characters within some URLs ] may need to be escaped to avoid interacting with your shell.

Or if you are using a POSIX shell rather than bash:

$ curl $(printf ' -O %s' $(cat urls.txt))

This relies on printf's behaviour of repeating the format pattern to exhaust the list of data arguments; not all stand-alone printfs will do this.

Note that this non-xargs method also may bump up against system limits for very large lists of URLs. Research ARG_MAX and MAX_ARG_STRLEN if this is a concern.

Solution 2 - Unix

A very simple solution would be the following: If you have a file 'file.txt' like

url="http://www.google.de"
url="http://www.yahoo.de"
url="http://www.bing.de"

Then you can use curl and simply do

curl -K file.txt

And curl will call all Urls contained in your file.txt!

So if you have control over your input-file-format, maybe this is the simplest solution for you!

Solution 3 - Unix

Or you could just do this:

cat urls.txt | xargs curl -O

You only need to use the -I parameter when you want to insert the cat output in the middle of a command.

Solution 4 - Unix

xargs -P 10 | curl

GNU xargs -P can run multiple curl processes in parallel. E.g. to run 10 processes:

xargs -P 10 -n 1 curl -O < urls.txt

This will speed up download 10x if your maximum download speed if not reached and if the server does not throttle IPs, which is the most common scenario.

Just don't set -P too high or your RAM may be overwhelmed.

GNU parallel can achieve similar results.

The downside of those methods is that they don't use a single connection for all files, which what curl does if you pass multiple URLs to it at once as in:

curl -O out1.txt http://exmple.com/1 -O out2.txt http://exmple.com/2

as mentioned at https://serverfault.com/questions/199434/how-do-i-make-curl-use-keepalive-from-the-command-line

Maybe combining both methods would give the best results? But I imagine that parallelization is more important than keeping the connection alive.

See also: https://stackoverflow.com/questions/8634109/parallel-download-using-curl-command-line-utility

Solution 5 - Unix

Here is how I do it on a Mac (OSX), but it should work equally well on other systems:

What you need is a text file that contains your links for curl

like so:

    http://www.site1.com/subdirectory/file1-[01-15].jpg
    http://www.site1.com/subdirectory/file2-[01-15].jpg
    .
    .
    http://www.site1.com/subdirectory/file3287-[01-15].jpg

In this hypothetical case, the text file has 3287 lines and each line is coding for 15 pictures.

Let's say we save these links in a text file called testcurl.txt on the top level (/) of our hard drive.

Now we have to go into the terminal and enter the following command in the bash shell:

    for i in "`cat /testcurl.txt`" ; do curl -O "$i" ; done

Make sure you are using back ticks (`) Also make sure the flag (-O) is a capital O and NOT a zero

with the -O flag, the original filename will be taken

Happy downloading!

Solution 6 - Unix

As others have rightly mentioned:

-cat urls.txt | xargs -0 curl -O
+cat urls.txt | xargs -n1 curl -O

However, this paradigm is a very bad idea, especially if all of your URLs come from the same server -- you're not only going to be spawning another curl instance, but will also be establishing a new TCP connection for each request, which is highly inefficient, and even more so with the now ubiquitous https.

Please use this instead:

-cat urls.txt | xargs -n1 curl -O
+cat urls.txt | wget -i/dev/fd/0

Or, even simpler:

-cat urls.txt | wget -i/dev/fd/0
+wget -i/dev/fd/0 < urls.txt

Simplest yet:

-wget -i/dev/fd/0 < urls.txt
+wget -iurls.txt

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionFinchView Question on Stackoverflow
Solution 1 - UnixghotiView Answer on Stackoverflow
Solution 2 - UnixDirkView Answer on Stackoverflow
Solution 3 - Unixuser1101791View Answer on Stackoverflow
Solution 4 - UnixCiro Santilli Путлер Капут 六四事View Answer on Stackoverflow
Solution 5 - UnixStefan GruenwaldView Answer on Stackoverflow
Solution 6 - UnixcnstView Answer on Stackoverflow