How can we use tqdm in a parallel execution with joblib?
PythonParallel ProcessingJoblibTqdmPython Problem Overview
I want to run a function in parallel, and wait until all parallel nodes are done, using joblib. Like in the example:
from math import sqrt
from joblib import Parallel, delayed
Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10))
But, I want that the execution will be seen in a single progressbar like with tqdm, showing how many jobs has been completed.
How would you do that?
Python Solutions
Solution 1 - Python
Just put range(10)
inside tqdm(...)
! It probably seemed too good to be true for you, but it really works (on my machine):
from math import sqrt
from joblib import Parallel, delayed
from tqdm import tqdm
result = Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in tqdm(range(100000)))
Solution 2 - Python
I've created pqdm a parallel tqdm wrapper with concurrent futures to comfortably get this done, give it a try!
To install
pip install pqdm
and use
from pqdm.processes import pqdm
# If you want threads instead:
# from pqdm.threads import pqdm
args = [1, 2, 3, 4, 5]
# args = range(1,6) would also work
def square(a):
return a*a
result = pqdm(args, square, n_jobs=2)
Solution 3 - Python
Modifying nth's great answer to permit a dynamic flag to use TQDM or not and to specify the total ahead of time so that the status bar fills in correctly.
from tqdm.auto import tqdm
from joblib import Parallel
class ProgressParallel(Parallel):
def __init__(self, use_tqdm=True, total=None, *args, **kwargs):
self._use_tqdm = use_tqdm
self._total = total
super().__init__(*args, **kwargs)
def __call__(self, *args, **kwargs):
with tqdm(disable=not self._use_tqdm, total=self._total) as self._pbar:
return Parallel.__call__(self, *args, **kwargs)
def print_progress(self):
if self._total is None:
self._pbar.total = self.n_dispatched_tasks
self._pbar.n = self.n_completed_tasks
self._pbar.refresh()
Solution 4 - Python
As noted above, solutions that simply wrap the iterable passed to joblib.Parallel()
do not truly monitor the progress of execution. Instead, I suggest subclassing Parallel
and overriding the print_progress()
method, as follows:
import joblib
from tqdm.auto import tqdm
class ProgressParallel(joblib.Parallel):
def __call__(self, *args, **kwargs):
with tqdm() as self._pbar:
return joblib.Parallel.__call__(self, *args, **kwargs)
def print_progress(self):
self._pbar.total = self.n_dispatched_tasks
self._pbar.n = self.n_completed_tasks
self._pbar.refresh()
Solution 5 - Python
Here's possible workaround
def func(x):
time.sleep(random.randint(1, 10))
return x
def text_progessbar(seq, total=None):
step = 1
tick = time.time()
while True:
time_diff = time.time()-tick
avg_speed = time_diff/step
total_str = 'of %n' % total if total else ''
print('step', step, '%.2f' % time_diff,
'avg: %.2f iter/sec' % avg_speed, total_str)
step += 1
yield next(seq)
all_bar_funcs = {
'tqdm': lambda args: lambda x: tqdm(x, **args),
'txt': lambda args: lambda x: text_progessbar(x, **args),
'False': lambda args: iter,
'None': lambda args: iter,
}
def ParallelExecutor(use_bar='tqdm', **joblib_args):
def aprun(bar=use_bar, **tq_args):
def tmp(op_iter):
if str(bar) in all_bar_funcs.keys():
bar_func = all_bar_funcs[str(bar)](tq_args)
else:
raise ValueError("Value %s not supported as bar type"%bar)
return Parallel(**joblib_args)(bar_func(op_iter))
return tmp
return aprun
aprun = ParallelExecutor(n_jobs=5)
a1 = aprun(total=25)(delayed(func)(i ** 2 + j) for i in range(5) for j in range(5))
a2 = aprun(total=16)(delayed(func)(i ** 2 + j) for i in range(4) for j in range(4))
a2 = aprun(bar='txt')(delayed(func)(i ** 2 + j) for i in range(4) for j in range(4))
a2 = aprun(bar=None)(delayed(func)(i ** 2 + j) for i in range(4) for j in range(4))
Solution 6 - Python
If your problem consists of many parts, you could split the parts into k
subgroups, run each subgroup in parallel and update the progressbar in between, resulting in k
updates of the progress.
This is demonstrated in the following example from the documentation.
>>> with Parallel(n_jobs=2) as parallel:
... accumulator = 0.
... n_iter = 0
... while accumulator < 1000:
... results = parallel(delayed(sqrt)(accumulator + i ** 2)
... for i in range(5))
... accumulator += sum(results) # synchronization barrier
... n_iter += 1
https://pythonhosted.org/joblib/parallel.html#reusing-a-pool-of-workers