Convert floats to ints in Pandas?

Python Pandas Floating Point Integer Dataset

Python Problem Overview

I've been working with data imported from a CSV. Pandas changed some columns to float, so now the numbers in these columns get displayed as floating points! However, I need them to be displayed as integers or without comma. Is there a way to convert them to integers or not display the comma?

Python Solutions

Solution 1 - Python

To modify the float output do this:

df= pd.DataFrame(range(5), columns=['a'])
df.a = df.a.astype(float)
df

Out[33]:

          a
0 0.0000000
1 1.0000000
2 2.0000000
3 3.0000000
4 4.0000000

pd.options.display.float_format = '{:,.0f}'.format
df

Out[35]:

   a
0  0
1  1
2  2
3  3
4  4

Solution 2 - Python

Use the pandas.DataFrame.astype(<type>) function to manipulate column dtypes.

>>> df = pd.DataFrame(np.random.rand(3,4), columns=list("ABCD"))
>>> df
          A         B         C         D
0  0.542447  0.949988  0.669239  0.879887
1  0.068542  0.757775  0.891903  0.384542
2  0.021274  0.587504  0.180426  0.574300
>>> df[list("ABCD")] = df[list("ABCD")].astype(int)
>>> df
   A  B  C  D
0  0  0  0  0
1  0  0  0  0
2  0  0  0  0

EDIT:

To handle missing values:

>>> df
          A         B     C         D
0  0.475103  0.355453  0.66  0.869336
1  0.260395  0.200287   NaN  0.617024
2  0.517692  0.735613  0.18  0.657106
>>> df[list("ABCD")] = df[list("ABCD")].fillna(0.0).astype(int)
>>> df
   A  B  C  D
0  0  0  0  0
1  0  0  0  0
2  0  0  0  0

Solution 3 - Python

Considering the following data frame:

>>> df = pd.DataFrame(10*np.random.rand(3, 4), columns=list("ABCD"))
>>> print(df)
...           A         B         C         D
... 0  8.362940  0.354027  1.916283  6.226750
... 1  1.988232  9.003545  9.277504  8.522808
... 2  1.141432  4.935593  2.700118  7.739108

Using a list of column names, change the type for multiple columns with applymap():

>>> cols = ['A', 'B']
>>> df[cols] = df[cols].applymap(np.int64)
>>> print(df)
...    A  B         C         D
... 0  8  0  1.916283  6.226750
... 1  1  9  9.277504  8.522808
... 2  1  4  2.700118  7.739108

Or for a single column with apply():

>>> df['C'] = df['C'].apply(np.int64)
>>> print(df)
...    A  B  C         D
... 0  8  0  1  6.226750
... 1  1  9  9  8.522808
... 2  1  4  2  7.739108

Solution 4 - Python

To convert all float columns to int

>>> df = pd.DataFrame(np.random.rand(5, 4) * 10, columns=list('PQRS'))
>>> print(df)
... 	P	        Q	        R	        S
... 0	4.395994	0.844292	8.543430	1.933934
... 1	0.311974	9.519054	6.171577	3.859993
... 2	2.056797	0.836150	5.270513	3.224497
... 3	3.919300	8.562298	6.852941	1.415992
... 4	9.958550	9.013425	8.703142	3.588733

>>> float_col = df.select_dtypes(include=['float64']) # This will select float columns only
>>> # list(float_col.columns.values)

>>> for col in float_col.columns.values:
...     df[col] = df[col].astype('int64')

>>> print(df)
... 	P	Q	R	S
... 0	4	0	8	1
... 1	0	9	6	3
... 2	2	0	5	3
... 3	3	8	6	1
... 4	9	9	8	3

Solution 5 - Python

This is a quick solution in case you want to convert more columns of your pandas.DataFrame from float to integer considering also the case that you can have NaN values.

cols = ['col_1', 'col_2', 'col_3', 'col_4']
for col in cols:
   df[col] = df[col].apply(lambda x: int(x) if x == x else "")

I tried with else x) and else None), but the result is still having the float number, so I used else "".

Solution 6 - Python

Expanding on @Ryan G mentioned usage of the pandas.DataFrame.astype(<type>) method, one can use the errors=ignore argument to only convert those columns that do not produce an error, which notably simplifies the syntax. Obviously, caution should be applied when ignoring errors, but for this task it comes very handy.

>>> df = pd.DataFrame(np.random.rand(3, 4), columns=list('ABCD'))
>>> df *= 10
>>> print(df)
...           A       B       C       D
... 0   2.16861 8.34139 1.83434 6.91706
... 1   5.85938 9.71712 5.53371 4.26542
... 2   0.50112 4.06725 1.99795 4.75698

>>> df['E'] = list('XYZ')
>>> df.astype(int, errors='ignore')
>>> print(df)
...     A   B   C   D   E
... 0   2   8   1   6   X
... 1   5   9   5   4   Y
... 2   0   4   1   4   Z

From pandas.DataFrame.astype docs: > errors : {‘raise’, ‘ignore’}, default ‘raise’ > > Control raising of exceptions on invalid data for provided dtype. > > - raise : allow exceptions to be raised > - ignore : suppress exceptions. On error return original object > > New in version 0.20.0.

Solution 7 - Python

The columns that needs to be converted to int can be mentioned in a dictionary also as below

df = df.astype({'col1': 'int', 'col2': 'int', 'col3': 'int'})

Solution 8 - Python

>>> import pandas as pd
>>> right = pd.DataFrame({'C': [1.002, 2.003], 'D': [1.009, 4.55], 'key': ['K0', 'K1']})
>>> print(right)
           C      D key
    0  1.002  1.009  K0
    1  2.003  4.550  K1
>>> right['C'] = right.C.astype(int)
>>> print(right)
       C      D key
    0  1  1.009  K0
    1  2  4.550  K1

Solution 9 - Python

Use `'Int64'` for NaN support

astype(int) and astype('int64') cannot handle missing values (numpy int)
astype('Int64') can handle missing values (pandas int)

df['A'] = df['A'].astype('Int64') # capital I

This assumes you want to keep missing values as NaN. If you plan to impute them, you could fillna first as Ryan suggested.

Examples of `'Int64'` (capital `I`)

If the floats are already rounded, just use astype:

df = pd.DataFrame({'A': [99.0, np.nan, 42.0]})

df['A'] = df['A'].astype('Int64')
#       A
# 0    99
# 1  <NA>
# 2    42

If the floats are not rounded yet, round before astype:

df = pd.DataFrame({'A': [3.14159, np.nan, 1.61803]})

df['A'] = df['A'].round().astype('Int64')
#       A
# 0     3
# 1  <NA>
# 2     2

To read int+NaN data from a file, use dtype='Int64' to avoid the need for converting at all:

csv = io.StringIO('''
id,rating
foo,5
bar,
baz,2
''')

df = pd.read_csv(csv, dtype={'rating': 'Int64'})
#     id  rating
# 0  foo       5
# 1  bar    <NA>
# 2  baz       2

Notes

'Int64' is an alias for Int64Dtype:

df['A'] = df['A'].astype(pd.Int64Dtype()) # same as astype('Int64')

Sized/signed aliases are available:

	lower bound	upper bound
`'Int8'`	-128	127
`'Int16'`	-32,768	32,767
`'Int32'`	-2,147,483,648	2,147,483,647
`'Int64'`	-9,223,372,036,854,775,808	9,223,372,036,854,775,807
`'UInt8'`	0	255
`'UInt16'`	0	65,535
`'UInt32'`	0	4,294,967,295
`'UInt64'`	0	18,446,744,073,709,551,615

Solution 10 - Python

In the text of the question is explained that the data comes from a csv. Só, I think that show options to make the conversion when the data is read and not after are relevant to the topic.

When importing spreadsheets or csv in a dataframe, "only integer columns" are commonly converted to float because excel stores all numerical values as floats and how the underlying libraries works.

When the file is read with read_excel or read_csv there are a couple of options avoid the after import conversion:

parameter dtype allows a pass a dictionary of column names and target types like dtype = {"my_column": "Int64"}
parameter converters can be used to pass a function that makes the conversion, for example changing NaN's with 0. converters = {"my_column": lambda x: int(x) if x else 0}
parameter convert_float will convert "integral floats to int (i.e., 1.0 –> 1)", but take care with corner cases like NaN's. This parameter is only available in read_excel

To make the conversion in an existing dataframe several alternatives have been given in other comments, but since v1.0.0 pandas has a interesting function for this cases: convert_dtypes, that "Convert columns to best possible dtypes using dtypes supporting pd.NA."

As example:

In [3]: import numpy as np                                                                                                                                                                                         

In [4]: import pandas as pd                                                                                                                                                                                        

In [5]: df = pd.DataFrame( 
   ...:     { 
   ...:         "a": pd.Series([1, 2, 3], dtype=np.dtype("int64")), 
   ...:         "b": pd.Series([1.0, 2.0, 3.0], dtype=np.dtype("float")), 
   ...:         "c": pd.Series([1.0, np.nan, 3.0]), 
   ...:         "d": pd.Series([1, np.nan, 3]), 
   ...:     } 
   ...: )                                                                                                                                                                                                          

In [6]: df                                                                                                                                                                                                         
Out[6]: 
   a    b    c    d
0  1  1.0  1.0  1.0
1  2  2.0  NaN  NaN
2  3  3.0  3.0  3.0

In [7]: df.dtypes                                                                                                                                                                                                  
Out[7]: 
a      int64
b    float64
c    float64
d    float64
dtype: object

In [8]: converted = df.convert_dtypes()                                                                                                                                                                            

In [9]: converted.dtypes                                                                                                                                                                                           
Out[9]: 
a    Int64
b    Int64
c    Int64
d    Int64
dtype: object

In [10]: converted                                                                                                                                                                                                 
Out[10]: 
   a  b     c     d
0  1  1     1     1
1  2  2  <NA>  <NA>
2  3  3     3     3

Solution 11 - Python

Although there are many options here, You can also convert the format of specific columns using a dictionary

Data = pd.read_csv('Your_Data.csv')

Data_2 = Data.astype({"Column a":"int32", "Column_b": "float64", "Column_c": "int32"})

print(Data_2 .dtypes) # Check the dtypes of the columns

This is an useful and very fast way to change the data format of specific columns for quick data analysis.

Content Type	Original Author	Original Content on Stackoverflow
Question	MJP	View Question on Stackoverflow
Solution 1 - Python	EdChum	View Answer on Stackoverflow
Solution 2 - Python	Ryan G	View Answer on Stackoverflow
Solution 3 - Python	user4322543	View Answer on Stackoverflow
Solution 4 - Python	Suhas_Pote	View Answer on Stackoverflow
Solution 5 - Python	enri	View Answer on Stackoverflow
Solution 6 - Python	aebmad	View Answer on Stackoverflow
Solution 7 - Python	prashanth	View Answer on Stackoverflow
Solution 8 - Python	user8051244	View Answer on Stackoverflow
Solution 9 - Python	tdy	View Answer on Stackoverflow
Solution 10 - Python	Francisco Puga	View Answer on Stackoverflow
Solution 11 - Python	Fellipe Alcantara	View Answer on Stackoverflow

Convert floats to ints in Pandas?

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

To convert all float columns to int

Solution 5 - Python

Solution 6 - Python

Solution 7 - Python

Solution 8 - Python

Solution 9 - Python

Use `'Int64'` for NaN support

Examples of `'Int64'` (capital `I`)

Notes

Solution 10 - Python

Solution 11 - Python

How do I SET the GOPATH environment variable on Ubuntu? What file must I edit?

Android adding simple animations while setvisibility(view.Gone)

Attributions

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

To convert all float columns to int

Solution 5 - Python

Solution 6 - Python

Solution 7 - Python

Solution 8 - Python

Solution 9 - Python

Use 'Int64' for NaN support

Examples of 'Int64' (capital I)

Notes

Solution 10 - Python

Solution 11 - Python

How do I SET the GOPATH environment variable on Ubuntu? What file must I edit?

Android adding simple animations while setvisibility(view.Gone)

Attributions

Use `'Int64'` for NaN support

Examples of `'Int64'` (capital `I`)