What does the delayed() function do (when used with joblib in Python)

PythonMultiprocessingJoblib

Python Problem Overview


I've read through the documentation, but I don't understand what is meant by: The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax.

I'm using it to iterate over the list I want to operate on (allImages) as follows:

def joblib_loop():
    Parallel(n_jobs=8)(delayed(getHog)(i) for i in allImages)

This returns my HOG features, like I want (and with the speed gain using all my 8 cores), but I'm just not sure what it is actually doing.

My Python knowledge is alright at best, and it's very possible that I'm missing something basic. Any pointers in the right direction would be most appreciated

Python Solutions


Solution 1 - Python

Perhaps things become clearer if we look at what would happen if instead we simply wrote

Parallel(n_jobs=8)(getHog(i) for i in allImages)

which, in this context, could be expressed more naturally as:

  1. Create a Parallel instance with n_jobs=8
  2. create the list [getHog(i) for i in allImages]
  3. pass that list to the Parallel instance

What's the problem? By the time the list gets passed to the Parallel object, all getHog(i) calls have already returned - so there's nothing left to execute in Parallel! All the work was already done in the main thread, sequentially.

What we actually want is to tell Python what functions we want to call with what arguments, without actually calling them - in other words, we want to delay the execution.

This is what delayed conveniently allows us to do, with clear syntax. If we want to tell Python that we'd like to call foo(2, g=3) sometime later, we can simply write delayed(foo)(2, g=3). Returned is the tuple (foo, [2], {g: 3}), containing:

  • a reference to the function we want to call, e.g.foo
  • all arguments (short "args") without a keyword, e.g.t 2
  • all keyword arguments (short "kwargs"), e.g. g=3

So, by writing Parallel(n_jobs=8)(delayed(getHog)(i) for i in allImages), instead of the above sequence, now the following happens:

  1. A Parallel instance with n_jobs=8 gets created

  2. The list

     [delayed(getHog)(i) for i in allImages]
    

    gets created, evaluating to

     [(getHog, [img1], {}), (getHog, [img2], {}), ... ]
    
  3. That list is passed to the Parallel instance

  4. The Parallel instance creates 8 threads and distributes the tuples from the list to them

  5. Finally, each of those threads starts executing the tuples, i.e., they call the first element with the second and the third elements unpacked as arguments tup[0](*tup[1], **tup[2]), turning the tuple back into the call we actually intended to do, getHog(img2).

Solution 2 - Python

we need a loop to test a list of different model configurations. This is the main function that drives the grid search process and will call the score_model() function for each model configuration. We can dramatically speed up the grid search process by evaluating model configurations in parallel. One way to do that is to use the Joblib library . We can define a Parallel object with the number of cores to use and set it to the number of scores detected in your hardware.

define executor

executor = Parallel(n_jobs=cpu_count(), backend= 'multiprocessing' )

then create a list of tasks to execute in parallel, which will be one call to the score model() function for each model configuration we have.

suppose def score_model(data, n_test, cfg): ........................

define list of tasks

tasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list)

we can use the Parallel object to execute the list of tasks in parallel.

scores = executor(tasks)

Solution 3 - Python

So what you want to be able to do is pile up a set of function calls and their arguments in such a way that you can pass them out efficiently to a scheduler/executor. Delayed is a decorator that takes in a function and its args and wraps them into an object that can be put in a list and popped out as needed. Dask has the same thing which it uses in part to feed into its graph scheduler.

Solution 4 - Python

From reference https://wiki.python.org/moin/ParallelProcessing The Parallel object creates a multiprocessing pool that forks the Python interpreter in multiple processes to execute each of the items of the list. The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax.

Another thing I would like to suggest is instead of explicitly defining num of cores we can generalize like this:

import multiprocessing
num_core=multiprocessing.cpu_count()

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionorrymrView Question on Stackoverflow
Solution 1 - PythonNearooView Answer on Stackoverflow
Solution 2 - PythonRajView Answer on Stackoverflow
Solution 3 - Pythonuser85779View Answer on Stackoverflow
Solution 4 - Pythonvincent15View Answer on Stackoverflow