Convert Pandas dataframe to Sparse Numpy Matrix directly

PythonNumpyPandasScipy

Python Problem Overview


I am creating a matrix from a Pandas dataframe as follows:

dense_matrix = np.array(df.as_matrix(columns = None), dtype=bool).astype(np.int)

And then into a sparse matrix with:

sparse_matrix = scipy.sparse.csr_matrix(dense_matrix)

Is there any way to go from a df straight to a sparse matrix?

Thanks in advance.

Python Solutions


Solution 1 - Python

df.values is a numpy array, and accessing values that way is always faster than np.array.

scipy.sparse.csr_matrix(df.values)

You might need to take the transpose first, like df.values.T. In DataFrames, the columns are axis 0.

Solution 2 - Python

There is a way to do it without converting to dense en route: csr_sparse_matrix = df.sparse.to_coo().tocsr()

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser7289View Question on Stackoverflow
Solution 1 - PythonDan AllanView Answer on Stackoverflow
Solution 2 - PythonG. CohenView Answer on Stackoverflow