python multithreading wait till all threads finished

PythonMultithreading

Python Problem Overview


This may have been asked in a similar context but I was unable to find an answer after about 20 minutes of searching, so I will ask.

I have written a Python script (lets say: scriptA.py) and a script (lets say scriptB.py)

In scriptB I want to call scriptA multiple times with different arguments, each time takes about an hour to run, (its a huge script, does lots of stuff.. don't worry about it) and I want to be able to run the scriptA with all the different arguments simultaneously, but I need to wait till ALL of them are done before continuing; my code:

import subprocess

#setup
do_setup()

#run scriptA
subprocess.call(scriptA + argumentsA)
subprocess.call(scriptA + argumentsB)
subprocess.call(scriptA + argumentsC)

#finish
do_finish()

I want to do run all the subprocess.call() at the same time, and then wait till they are all done, how should I do this?

I tried to use threading like the example here:

from threading import Thread
import subprocess

def call_script(args)
    subprocess.call(args)
    
#run scriptA   
t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))
t1.start()
t2.start()
t3.start()

But I do not think this is right.

How do I know they have all finished running before going to my do_finish()?

Python Solutions


Solution 1 - Python

Put the threads in a list and then use the Join method

 threads = []

 t = Thread(...)
 threads.append(t)

 ...repeat as often as necessary...

 # Start all threads
 for x in threads:
     x.start()

 # Wait for all of them to finish
 for x in threads:
     x.join()

Solution 2 - Python

You need to use join method of Thread object in the end of the script.

t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))

t1.start()
t2.start()
t3.start()

t1.join()
t2.join()
t3.join()

Thus the main thread will wait till t1, t2 and t3 finish execution.

Solution 3 - Python

In Python3, since Python 3.2 there is a new approach to reach the same result, that I personally prefer to the traditional thread creation/start/join, package concurrent.futures: https://docs.python.org/3/library/concurrent.futures.html

Using a ThreadPoolExecutor the code would be:

from concurrent.futures.thread import ThreadPoolExecutor
import time
    
def call_script(ordinal, arg):
    print('Thread', ordinal, 'argument:', arg)
    time.sleep(2)
    print('Thread', ordinal, 'Finished')
    
args = ['argumentsA', 'argumentsB', 'argumentsC']
    
with ThreadPoolExecutor(max_workers=2) as executor:
    ordinal = 1
    for arg in args:
        executor.submit(call_script, ordinal, arg)
        ordinal += 1
print('All tasks has been finished')

The output of the previous code is something like:

Thread 1 argument: argumentsA
Thread 2 argument: argumentsB
Thread 1 Finished
Thread 2 Finished
Thread 3 argument: argumentsC
Thread 3 Finished
All tasks has been finished

One of the advantages is that you can control the throughput setting the max concurrent workers.

Solution 4 - Python

I prefer using list comprehension based on an input list:

inputs = [scriptA + argumentsA, scriptA + argumentsB, ...]
threads = [Thread(target=call_script, args=(i)) for i in inputs]
[t.start() for t in threads]
[t.join() for t in threads]

Solution 5 - Python

You can have class something like below from which you can add 'n' number of functions or console_scripts you want to execute in parallel passion and start the execution and wait for all jobs to complete..

from multiprocessing import Process
       
class ProcessParallel(object):
    """
    To Process the  functions parallely
    
    """    
    def __init__(self, *jobs):
        """
        """
        self.jobs = jobs
        self.processes = []
        
    def fork_processes(self):
        """
        Creates the process objects for given function deligates
        """
        for job in self.jobs:
            proc  = Process(target=job)
            self.processes.append(proc)
        
    def start_all(self):
        """
        Starts the functions process all together.
        """
        for proc in self.processes:
            proc.start()
            
    def join_all(self):
        """
        Waits untill all the functions executed.
        """
        for proc in self.processes:
            proc.join()
            
  
def two_sum(a=2, b=2):
    return a + b
    
def multiply(a=2, b=2):
    return a * b
    
          
#How to run:
if __name__ == '__main__':
    #note: two_sum, multiply can be replace with any python console scripts which
    #you wanted to run parallel..
    procs =  ProcessParallel(two_sum, multiply)
    #Add all the process in list
    procs.fork_processes()
    #starts  process execution 
    procs.start_all()
    #wait until all the process got executed
    procs.join_all()

Solution 6 - Python

I just came across the same problem where I needed to wait for all the threads which were created using the for loop.I just tried out the following piece of code.It may not be the perfect solution but I thought it would be a simple solution to test:

for t in threading.enumerate():
    try:
        t.join()
    except RuntimeError as err:
        if 'cannot join current thread' in err:
            continue
        else:
            raise

Solution 7 - Python

From the threading module documentation

> There is a “main thread” object; this corresponds to the initial > thread of control in the Python program. It is not a daemon thread.

> There is the possibility that “dummy thread objects” are created. > These are thread objects corresponding to “alien threads”, which are > threads of control started outside the threading module, such as > directly from C code. Dummy thread objects have limited functionality; > they are always considered alive and daemonic, and cannot be join()ed. > They are never deleted, since it is impossible to detect the > termination of alien threads.

So, to catch those two cases when you are not interested in keeping a list of the threads you create:

import threading as thrd


def alter_data(data, index):
    data[index] *= 2


data = [0, 2, 6, 20]

for i, value in enumerate(data):
    thrd.Thread(target=alter_data, args=[data, i]).start()

for thread in thrd.enumerate():
    if thread.daemon:
        continue
    try:
        thread.join()
    except RuntimeError as err:
        if 'cannot join current thread' in err.args[0]:
            # catchs main thread
            continue
        else:
            raise

Whereupon:

>>> print(data)
[0, 4, 12, 40]

Solution 8 - Python

Maybe, something like

for t in threading.enumerate():
    if t.daemon:
        t.join()

Solution 9 - Python

using only join can result in false-possitive interaction with thread. Like said in docs :

> When the timeout argument is present and not None, it should be a > floating point number specifying a timeout for the operation in > seconds (or fractions thereof). As join() always returns None, you > must call isAlive() after join() to decide whether a timeout happened > – if the thread is still alive, the join() call timed out.

and illustrative piece of code:

threads = []
for name in some_data:
    new = threading.Thread(
        target=self.some_func,
        args=(name,)
    )
    threads.append(new)
    new.start()
	
over_threads = iter(threads)
curr_th = next(over_threads)
while True:
    curr_th.join()
    if curr_th.is_alive():
        continue
    try:
        curr_th = next(over_threads)
    except StopIteration:
        break

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionInbar RoseView Question on Stackoverflow
Solution 1 - PythonAaron DigullaView Answer on Stackoverflow
Solution 2 - PythonMaksim SkurydzinView Answer on Stackoverflow
Solution 3 - PythonRobertoView Answer on Stackoverflow
Solution 4 - PythonAdam MatanView Answer on Stackoverflow
Solution 5 - PythonPBDView Answer on Stackoverflow
Solution 6 - PythonOmkarView Answer on Stackoverflow
Solution 7 - Pythonberna1111View Answer on Stackoverflow
Solution 8 - PythonjnoView Answer on Stackoverflow
Solution 9 - PythonxillmeraView Answer on Stackoverflow