load csv into 2D matrix with numpy for plotting

PythonArraysCsvNumpyReshape

Python Problem Overview


Given this CSV file:

"A","B","C","D","E","F","timestamp"
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12

I simply want to load it as a matrix/ndarray with 3 rows and 7 columns. However, for some reason, all I can get out of numpy is an ndarray with 3 rows (one per line) and no columns.

r = np.genfromtxt(fname,delimiter=',',dtype=None, names=True)
print r
print r.shape

[ (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291111964948.0)
 (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291113113366.0)
 (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291120650486.0)]
(3,)

I can manually iterate and hack it into the shape I want, but this seems silly. I just want to load it as a proper matrix so I can slice it across different dimensions and plot it, just like in matlab.

Python Solutions


Solution 1 - Python

Pure numpy

numpy.loadtxt(open("test.csv", "rb"), delimiter=",", skiprows=1)

Check out the loadtxt documentation.

You can also use python's csv module:

import csv
import numpy
reader = csv.reader(open("test.csv", "rb"), delimiter=",")
x = list(reader)
result = numpy.array(x).astype("float")

You will have to convert it to your favorite numeric type. I guess you can write the whole thing in one line:

result = numpy.array(list(csv.reader(open("test.csv", "rb"), delimiter=","))).astype("float")

Added Hint:

You could also use pandas.io.parsers.read_csv and get the associated numpy array which can be faster.

Solution 2 - Python

I think using dtype where there is a name row is confusing the routine. Try

>>> r = np.genfromtxt(fname, delimiter=',', names=True)
>>> r
array([[  6.11882430e+02,   9.08956010e+03,   5.13300000e+03,
          8.64075140e+02,   1.71537476e+03,   7.65227770e+02,
          1.29111196e+12],
       [  6.11882430e+02,   9.08956010e+03,   5.13300000e+03,
          8.64075140e+02,   1.71537476e+03,   7.65227770e+02,
          1.29111311e+12],
       [  6.11882430e+02,   9.08956010e+03,   5.13300000e+03,
          8.64075140e+02,   1.71537476e+03,   7.65227770e+02,
          1.29112065e+12]])
>>> r[:,0]    # Slice 0'th column
array([ 611.88243,  611.88243,  611.88243])

Solution 3 - Python

You can read a CSV file with headers into a NumPy structured array with np.genfromtxt. For example:

import numpy as np

csv_fname = 'file.csv'
with open(csv_fname, 'w') as fp:
    fp.write("""\
"A","B","C","D","E","F","timestamp"
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12
""")

# Read the CSV file into a Numpy record array
r = np.genfromtxt(csv_fname, delimiter=',', names=True, case_sensitive=True)
print(repr(r))

which looks like this:

array([(611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777, 1.29111196e+12),
       (611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777, 1.29111311e+12),
       (611.88243, 9089.5601, 5133., 864.07514, 1715.37476, 765.22777, 1.29112065e+12)],
      dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8'), ('D', '<f8'), ('E', '<f8'), ('F', '<f8'), ('timestamp', '<f8')])

You can access a named column like this r['E']:

array([1715.37476, 1715.37476, 1715.37476])

Note: this answer previously used np.recfromcsv to read the data into a NumPy record array. While there was nothing wrong with that method, structured arrays are generally better than record arrays for speed and compatibility.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondgorissenView Question on Stackoverflow
Solution 1 - PythonKaveh_khView Answer on Stackoverflow
Solution 2 - PythonmtrwView Answer on Stackoverflow
Solution 3 - PythonMike TView Answer on Stackoverflow