How to remove all rows in a numpy.ndarray that contain non-numeric values

PythonNumpy

Python Problem Overview


Basically, I'm doing some data analysis. I read in a dataset as a numpy.ndarray and some of the values are missing (either by just not being there, being NaN, or by being a string written "NA").

I want to clean out all rows containing any entry like this. How do I do that with a numpy ndarray?

Python Solutions


Solution 1 - Python

>>> a = np.array([[1,2,3], [4,5,np.nan], [7,8,9]])
array([[  1.,   2.,   3.],
       [  4.,   5.,  nan],
       [  7.,   8.,   9.]])

>>> a[~np.isnan(a).any(axis=1)]
array([[ 1.,  2.,  3.],
       [ 7.,  8.,  9.]])

and reassign this to a.

Explanation: np.isnan(a) returns a similar array with True where NaN, False elsewhere. .any(axis=1) reduces an m*n array to n with an logical or operation on the whole rows, ~ inverts True/False and a[ ] chooses just the rows from the original array, which have True within the brackets.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionzebraView Question on Stackoverflow
Solution 1 - PythoneumiroView Answer on Stackoverflow