Convert Pandas dataframe to Sparse Numpy Matrix directly
PythonNumpyPandasScipyPython Problem Overview
I am creating a matrix from a Pandas dataframe as follows:
dense_matrix = np.array(df.as_matrix(columns = None), dtype=bool).astype(np.int)
And then into a sparse matrix with:
sparse_matrix = scipy.sparse.csr_matrix(dense_matrix)
Is there any way to go from a df straight to a sparse matrix?
Thanks in advance.
Python Solutions
Solution 1 - Python
df.values
is a numpy array, and accessing values that way is always faster than np.array
.
scipy.sparse.csr_matrix(df.values)
You might need to take the transpose first, like df.values.T
. In DataFrames, the columns are axis 0.
Solution 2 - Python
There is a way to do it without converting to dense en route:
csr_sparse_matrix = df.sparse.to_coo().tocsr()