Is there a way to run multiple cells simultaneously in IPython notebook?

IpythonIpython Notebook

Ipython Problem Overview


One cell in my notebook executes for a long time, while the other CPU's in the machine are idle. Is it possible to run other cells in parallel?

Ipython Solutions


Solution 1 - Ipython

Yes. Here is the documentation for ipyparallel (formerly IPython parallel) that will show you how to spawn multiple IPython kernel. After you are free to distribute the work across cores, and you can prefix cells with %%px0 %%px1... %%px999 (once set up) to execute a cell on a specific engine, which in practice correspond to parallel execution of cell. I woudl suggest having a look at Dask as well.

Solution 2 - Ipython

This does not answer your question directly but I think it would help a lot of people that are having the same problem. You can move variables between notebooks easily and then continue running the functions on another notebook then move the result back to the main notebook.

For example:

Notebook 1:

%store X
%store y

Notebook 2:

%store -r X
%store -r y

new_df = ...
%store new_df

Notebook 1:

%store -r new_df 

Solution 3 - Ipython

I want to introduce a library that has this feature, this does not require multiple notebooks tricks etc...

Parsl is the Productive parallel programming in Python

Configuration

import parsl
from parsl.app.app import python_app, bash_app
parsl.load()

As an example, I edited this snippet from parsl/parsl-tutorial.

# App that generates a random number after a delay
@python_app
def generate(limit,delay):
    from random import randint
    import time
    time.sleep(delay)
    return randint(1,limit)

# Generate 5 random numbers between 1 and 10
import time
st = time.time()
rand_nums = []
for i in range(5):
    rand_nums.append(generate(10, 1))

# Wait for all apps to finish and collect the results
outputs = [i.result() for i in rand_nums]
et = time.time()
print(f"Execution time: {et - st:.2f}")

# Print results
print(outputs)

Result:

Execution time: 3.00
[1, 6, 4, 8, 3]

> Note that the time it takes for the code to execute is 3s not 5s.

So what you can do is call the function (in this example is generate(...)) in a cell. This generate(...) will return a object. Then if you call the .result() on the object it will either:

  1. Halt the program if it's waiting for the result.
  2. Return the result if it's completed.

Therefore, as long as you call the .result() at the last few cells, the subroutine will be running in the background. And you can be sure at the last few cells the result can be obtained.

Regarding data dependencies, parsl is very smart, it will wait for the data that is dependent, even if it's decorated with the @python_app.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser2808117View Question on Stackoverflow
Solution 1 - IpythonMattView Answer on Stackoverflow
Solution 2 - IpythonYousefView Answer on Stackoverflow
Solution 3 - IpythonlowzhaoView Answer on Stackoverflow