Specifying data type in Pandas csv reader

PythonPandas

Python Problem Overview


I am just getting started with Pandas and I am reading in a csv file using the read_csv() method. The difficulty I am having is preventing pandas from converting my telephone numbers to large numbers, instead of keeping them as strings. I defined a converter which just left the numbers alone, but then they still converted to numbers. When I changed my converter to prepend a 'z' to the phone numbers, then they stayed strings. Is there some way to keep them strings without modifying the values of the fields?

Python Solutions


Solution 1 - Python

Since Pandas 0.11.0 you can use dtype argument to explicitly specify data type for each column:

d = pandas.read_csv('foo.csv', dtype={'BAR': 'S10'})

Solution 2 - Python

It looks like you can't avoid pandas from trying to convert numeric/boolean values in the CSV file. Take a look at the source code of pandas for the IO parsers, in particular functions _convert_to_ndarrays, and _convert_types. https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py

You can always assign the type you want after you have read the file:

df.phone = df.phone.astype(str)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGardnerView Question on Stackoverflow
Solution 1 - Pythonzero323View Answer on Stackoverflow
Solution 2 - PythonlbollaView Answer on Stackoverflow