Bash script processing limited number of commands in parallel

LinuxBashShell

Linux Problem Overview


I have a bash script that looks like this:

#!/bin/bash
wget LINK1 >/dev/null 2>&1
wget LINK2 >/dev/null 2>&1
wget LINK3 >/dev/null 2>&1
wget LINK4 >/dev/null 2>&1
# ..
# ..
wget LINK4000 >/dev/null 2>&1

But processing each line until the command is finished then moving to the next one is very time consuming, I want to process for instance 20 lines at once then when they're finished another 20 lines are processed.

I thought of wget LINK1 >/dev/null 2>&1 & to send the command to the background and carry on, but there are 4000 lines here this means I will have performance issues, not to mention being limited in how many processes I should start at the same time so this is not a good idea.

One solution that I'm thinking of right now is checking whether one of the commands is still running or not, for instance after 20 lines I can add this loop:

while [  $(ps -ef | grep KEYWORD | grep -v grep | wc -l) -gt 0 ]; do
sleep 1
done

Of course in this case I will need to append & to the end of the line! But I'm feeling this is not the right way to do it.

So how do I actually group each 20 lines together and wait for them to finish before going to the next 20 lines, this script is dynamically generated so I can do whatever math I want on it while it's being generated, but it DOES NOT have to use wget, it was just an example so any solution that is wget specific is not gonna do me any good.

Linux Solutions


Solution 1 - Linux

Use the wait built-in:

process1 &
process2 &
process3 &
process4 &
wait
process5 &
process6 &
process7 &
process8 &
wait

For the above example, 4 processes process1 ... process4 would be started in the background, and the shell would wait until those are completed before starting the next set.

From the GNU manual:

> wait [jobspec or pid ...] > > Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last > command waited for. If a job spec is given, all processes in the job > are waited for. If no arguments are given, all currently active child > processes are waited for, and the return status is zero. If neither > jobspec nor pid specifies an active child process of the shell, the > return status is 127.

Solution 2 - Linux

See parallel. Its syntax is similar to xargs, but it runs the commands in parallel.

Solution 3 - Linux

In fact, xargs can run commands in parallel for you. There is a special -P max_procs command-line option for that. See man xargs.

Solution 4 - Linux

You can run 20 processes and use the command:

wait

Your script will wait and continue when all your background jobs are finished.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAL-KatebView Question on Stackoverflow
Solution 1 - LinuxdevnullView Answer on Stackoverflow
Solution 2 - LinuxchorobaView Answer on Stackoverflow
Solution 3 - LinuxVader BView Answer on Stackoverflow
Solution 4 - LinuxBinpixView Answer on Stackoverflow