R expand.grid() function in Python
PythonRPython Problem Overview
Is there a Python function similar to the expand.grid() function in R ? Thanks in advance.
(EDIT) Below are the description of this R function and an example.
Create a Data Frame from All Combinations of Factors
Description:
Create a data frame from all combinations of the supplied vectors
or factors.
> x <- 1:3
> y <- 1:3
> expand.grid(x,y)
Var1 Var2
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
(EDIT2) Below is an example with the rpy package. I would like to get the same output object but without using R :
>>> from rpy import *
>>> a = [1,2,3]
>>> b = [5,7,9]
>>> r.assign("a",a)
[1, 2, 3]
>>> r.assign("b",b)
[5, 7, 9]
>>> r("expand.grid(a,b)")
{'Var1': [1, 2, 3, 1, 2, 3, 1, 2, 3], 'Var2': [5, 5, 5, 7, 7, 7, 9, 9, 9]}
EDIT 02/09/2012: I'm really lost with Python. Lev Levitsky's code given in his answer does not work for me:
>>> a = [1,2,3]
>>> b = [5,7,9]
>>> expandgrid(a, b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in expandgrid
NameError: global name 'itertools' is not defined
However the itertools module seems to be installed (typing from itertools import *
does not return any error message)
Python Solutions
Solution 1 - Python
Just use list comprehensions:
>>> [(x, y) for x in range(5) for y in range(5)]
[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4)]
convert to numpy array if desired:
>>> import numpy as np
>>> x = np.array([(x, y) for x in range(5) for y in range(5)])
>>> x.shape
(25, 2)
I have tested for up to 10000 x 10000 and performance of python is comparable to that of expand.grid in R. Using a tuple (x, y) is about 40% faster than using a list [x, y] in the comprehension.
OR...
Around 3x faster with np.meshgrid and much less memory intensive.
%timeit np.array(np.meshgrid(range(10000), range(10000))).reshape(2, 100000000).T
1 loops, best of 3: 736 ms per loop
in R:
> system.time(expand.grid(1:10000, 1:10000))
user system elapsed
1.991 0.416 2.424
Keep in mind that R has 1-based arrays whereas Python is 0-based.
Solution 2 - Python
product
from itertools
is the key to your solution. It produces a cartesian product of the inputs.
from itertools import product
def expand_grid(dictionary):
return pd.DataFrame([row for row in product(*dictionary.values())],
columns=dictionary.keys())
dictionary = {'color': ['red', 'green', 'blue'],
'vehicle': ['car', 'van', 'truck'],
'cylinders': [6, 8]}
>>> expand_grid(dictionary)
color cylinders vehicle
0 red 6 car
1 red 6 van
2 red 6 truck
3 red 8 car
4 red 8 van
5 red 8 truck
6 green 6 car
7 green 6 van
8 green 6 truck
9 green 8 car
10 green 8 van
11 green 8 truck
12 blue 6 car
13 blue 6 van
14 blue 6 truck
15 blue 8 car
16 blue 8 van
17 blue 8 truck
Solution 3 - Python
Here's an example that gives output similar to what you need:
import itertools
def expandgrid(*itrs):
product = list(itertools.product(*itrs))
return {'Var{}'.format(i+1):[x[i] for x in product] for i in range(len(itrs))}
>>> a = [1,2,3]
>>> b = [5,7,9]
>>> expandgrid(a, b)
{'Var1': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'Var2': [5, 7, 9, 5, 7, 9, 5, 7, 9]}
The difference is related to the fact that in itertools.product
the rightmost element advances on every iteration. You can tweak the function by sorting the product
list smartly if it's important.
EDIT (by S. Laurent)
To have the same as R:
def expandgrid(*itrs): # https://stackoverflow.com/a/12131385/1100107
"""
Cartesian product. Reversion is for compatibility with R.
"""
product = list(itertools.product(*reversed(itrs)))
return [[x[i] for x in product] for i in range(len(itrs))][::-1]
Solution 4 - Python
The pandas documentation defines an expand_grid
function:
def expand_grid(data_dict):
"""Create a dataframe from every combination of given values."""
rows = itertools.product(*data_dict.values())
return pd.DataFrame.from_records(rows, columns=data_dict.keys())
For this code to work, you will need the following two imports:
import itertools
import pandas as pd
The output is a pandas.DataFrame
which is the most comparable object in Python to an R data.frame
.
Solution 5 - Python
I've wondered this for a while and I haven't been satisfied with the solutions put forward so far, so I came up with my own, which is considerably simpler (but probably slower). The function uses numpy.meshgrid to make the grid, then flattens the grids into 1d arrays and puts them together:
def expand_grid(x, y):
xG, yG = np.meshgrid(x, y) # create the actual grid
xG = xG.flatten() # make the grid 1d
yG = yG.flatten() # same
return pd.DataFrame({'x':xG, 'y':yG}) # return a dataframe
For example:
import numpy as np
import pandas as pd
p, q = np.linspace(1, 10, 10), np.linspace(1, 10, 10)
def expand_grid(x, y):
xG, yG = np.meshgrid(x, y) # create the actual grid
xG = xG.flatten() # make the grid 1d
yG = yG.flatten() # same
return pd.DataFrame({'x':xG, 'y':yG})
print expand_grid(p, q).head(n = 20)
I know this is an old post, but I thought I'd share my simple version!
Solution 6 - Python
From the above solutions, I did this
import itertools
import pandas as pd
a = [1,2,3]
b = [4,5,6]
ab = list(itertools.product(a,b))
abdf = pd.DataFrame(ab,columns=("a","b"))
and the following is the output
a b
0 1 4
1 1 5
2 1 6
3 2 4
4 2 5
5 2 6
6 3 4
7 3 5
8 3 6
Solution 7 - Python
The ParameterGrid function from Scikit do the same as expand_grid(from R). Example:
from sklearn.model_selection import ParameterGrid
param_grid = {'a': [1,2,3], 'b': [5,7,9]}
expanded_grid = ParameterGrid(param_grid)
You can access the content transforming it into a list:
list(expanded_grid))
output:
[{'a': 1, 'b': 5}, {'a': 1, 'b': 7}, {'a': 1, 'b': 9}, {'a': 2, 'b': 5}, {'a': 2, 'b': 7}, {'a': 2, 'b': 9}, {'a': 3, 'b': 5}, {'a': 3, 'b': 7}, {'a': 3, 'b': 9}]
Acessing the elements by index
list(expanded_grid)[1]
You get something like this:
{'a': 1, 'b': 7}
Just adding some usage...you can use a list of dicts like the one printed above to pass to a function with **kwargs. Example:
def f(a,b): return((a+b, a-b))
list(map(lambda x: f(**x), list(expanded_grid)))
Output:
[(6, -4), (8, -6), (10, -8), (7, -3), (9, -5), (11, -7), (8, -2), (10, -4), (12, -6)]
Solution 8 - Python
Here's another version which returns a pandas.DataFrame:
import itertools as it
import pandas as pd
def expand_grid(*args, **kwargs):
columns = []
lst = []
if args:
columns += xrange(len(args))
lst += args
if kwargs:
columns += kwargs.iterkeys()
lst += kwargs.itervalues()
return pd.DataFrame(list(it.product(*lst)), columns=columns)
print expand_grid([0,1], [1,2,3])
print expand_grid(a=[0,1], b=[1,2,3])
print expand_grid([0,1], b=[1,2,3])
Solution 9 - Python
pyjanitor's expand_grid()
is arguably the most natural solution, especially if you come from an R background.
Usage is that you set the others
argument to a dictionary. The items in the dictionary can have different lengths and types. The return value is a pandas DataFrame.
import janitor as jn
jn.expand_grid(others = {
'x': range(0, 4),
'y': ['a', 'b', 'c'],
'z': [False, True]
})
Solution 10 - Python
Have you tried product
from itertools
? Quite a bit easier to use than some of these methods in my opinion (with the exception of pandas
and meshgrid
). Keep in mind that this setup actually pulls all the items from the iterator into a list, and then converts it to an ndarray
so be careful with higher dimensions or remove np.asarray(list(combs))
for higher dimensional grids unless you want to run out of memory, you can then refer to the iterator for specific combinations. I highly recommend meshgrid
for this though:
#Generate square grid from axis
from itertools import product
import numpy as np
a=np.array(list(range(3)))+1 # axis with offset for 0 base index to 1
points=product(a,repeat=2) #only allow repeats for (i,j), (j,i) pairs with i!=j
np.asarray(list(points)) #convert to ndarray
And I get the following output from this:
array([[1, 1],
[1, 2],
[1, 3],
[2, 1],
[2, 2],
[2, 3],
[3, 1],
[3, 2],
[3, 3]])
Solution 11 - Python
Here is a solution for an arbitrary number of heterogeneous column types. It's based on numpy.meshgrid
. Thomas Browne's answer works for homogenous column types. Nate's answer works for two columns.
import pandas as pd
import numpy as np
def expand_grid(*xi, columns=None):
"""Expand 1-D arrays xi into a pd.DataFrame
where each row is a unique combination of the xi.
Args:
x1, ..., xn (array_like): 1D-arrays to expand.
columns (list, optional): Column names for the output
DataFrame.
Returns:
Given vectors `x1, ..., xn` with lengths `Ni = len(xi)`
a pd.DataFrame of shape (prod(Ni), n) where rows are:
x1[0], x2[0], ..., xn-1[0], xn[0]
x1[1], x2[0], ..., xn-1[0], xn[0]
...
x1[N1 -1], x2[0], ..., xn-1[0], xn[0]
x1[0], x2[1], ..., xn-1[0], xn[0]
x1[1], x2[1], ..., xn-1[0], xn[0]
...
x1[N1 - 1], x2[N2 - 1], ..., xn-1[Nn-1 - 1], xn[Nn - 1]
"""
if columns is None:
columns = pd.RangeIndex(0, len(xi))
elif columns is not None and len(columns) != len(xi):
raise ValueError(
" ".join(["Expecting", str(len(xi)), "columns but",
str(len(columns)), "provided instead."])
)
return pd.DataFrame({
coln: arr.flatten() for coln, arr in zip(columns, np.meshgrid(*xi))
})