Is there a way to run multiple cells simultaneously in IPython notebook?
IpythonIpython NotebookIpython Problem Overview
One cell in my notebook executes for a long time, while the other CPU's in the machine are idle. Is it possible to run other cells in parallel?
Ipython Solutions
Solution 1 - Ipython
Yes. Here is the documentation for ipyparallel
(formerly IPython parallel
) that will show you how to spawn multiple IPython kernel. After you are free to distribute the work across cores, and you can prefix cells with %%px0
%%px1
... %%px999
(once set up) to execute a cell on a specific engine, which in practice correspond to parallel execution of cell. I woudl suggest having a look at Dask as well.
Solution 2 - Ipython
This does not answer your question directly but I think it would help a lot of people that are having the same problem. You can move variables between notebooks easily and then continue running the functions on another notebook then move the result back to the main notebook.
For example:
Notebook 1:
%store X
%store y
Notebook 2:
%store -r X
%store -r y
new_df = ...
%store new_df
Notebook 1:
%store -r new_df
Solution 3 - Ipython
I want to introduce a library that has this feature, this does not require multiple notebooks tricks etc...
Parsl is the Productive parallel programming in Python
Configuration
import parsl
from parsl.app.app import python_app, bash_app
parsl.load()
As an example, I edited this snippet from parsl/parsl-tutorial
.
# App that generates a random number after a delay
@python_app
def generate(limit,delay):
from random import randint
import time
time.sleep(delay)
return randint(1,limit)
# Generate 5 random numbers between 1 and 10
import time
st = time.time()
rand_nums = []
for i in range(5):
rand_nums.append(generate(10, 1))
# Wait for all apps to finish and collect the results
outputs = [i.result() for i in rand_nums]
et = time.time()
print(f"Execution time: {et - st:.2f}")
# Print results
print(outputs)
Result:
Execution time: 3.00
[1, 6, 4, 8, 3]
> Note that the time it takes for the code to execute is 3s not 5s.
So what you can do is call the function (in this example is generate(...)
) in a cell. This generate(...)
will return a object. Then if you call the .result()
on the object it will either:
- Halt the program if it's waiting for the result.
- Return the result if it's completed.
Therefore, as long as you call the .result()
at the last few cells, the subroutine will be running in the background. And you can be sure at the last few cells the result can be obtained.
Regarding data dependencies, parsl is very smart, it will wait for the data that is dependent, even if it's decorated with the @python_app
.