Convert floats to ints in Pandas?

PythonPandasFloating PointIntegerDataset

Python Problem Overview


I've been working with data imported from a CSV. Pandas changed some columns to float, so now the numbers in these columns get displayed as floating points! However, I need them to be displayed as integers or without comma. Is there a way to convert them to integers or not display the comma?

Python Solutions


Solution 1 - Python

To modify the float output do this:

df= pd.DataFrame(range(5), columns=['a'])
df.a = df.a.astype(float)
df

Out[33]:

          a
0 0.0000000
1 1.0000000
2 2.0000000
3 3.0000000
4 4.0000000

pd.options.display.float_format = '{:,.0f}'.format
df

Out[35]:

   a
0  0
1  1
2  2
3  3
4  4

Solution 2 - Python

Use the pandas.DataFrame.astype(<type>) function to manipulate column dtypes.

>>> df = pd.DataFrame(np.random.rand(3,4), columns=list("ABCD"))
>>> df
          A         B         C         D
0  0.542447  0.949988  0.669239  0.879887
1  0.068542  0.757775  0.891903  0.384542
2  0.021274  0.587504  0.180426  0.574300
>>> df[list("ABCD")] = df[list("ABCD")].astype(int)
>>> df
   A  B  C  D
0  0  0  0  0
1  0  0  0  0
2  0  0  0  0

EDIT:

To handle missing values:

>>> df
          A         B     C         D
0  0.475103  0.355453  0.66  0.869336
1  0.260395  0.200287   NaN  0.617024
2  0.517692  0.735613  0.18  0.657106
>>> df[list("ABCD")] = df[list("ABCD")].fillna(0.0).astype(int)
>>> df
   A  B  C  D
0  0  0  0  0
1  0  0  0  0
2  0  0  0  0

Solution 3 - Python

Considering the following data frame:

>>> df = pd.DataFrame(10*np.random.rand(3, 4), columns=list("ABCD"))
>>> print(df)
...           A         B         C         D
... 0  8.362940  0.354027  1.916283  6.226750
... 1  1.988232  9.003545  9.277504  8.522808
... 2  1.141432  4.935593  2.700118  7.739108

Using a list of column names, change the type for multiple columns with applymap():

>>> cols = ['A', 'B']
>>> df[cols] = df[cols].applymap(np.int64)
>>> print(df)
...    A  B         C         D
... 0  8  0  1.916283  6.226750
... 1  1  9  9.277504  8.522808
... 2  1  4  2.700118  7.739108

Or for a single column with apply():

>>> df['C'] = df['C'].apply(np.int64)
>>> print(df)
...    A  B  C         D
... 0  8  0  1  6.226750
... 1  1  9  9  8.522808
... 2  1  4  2  7.739108

Solution 4 - Python

To convert all float columns to int

>>> df = pd.DataFrame(np.random.rand(5, 4) * 10, columns=list('PQRS'))
>>> print(df)
... 	P	        Q	        R	        S
... 0	4.395994	0.844292	8.543430	1.933934
... 1	0.311974	9.519054	6.171577	3.859993
... 2	2.056797	0.836150	5.270513	3.224497
... 3	3.919300	8.562298	6.852941	1.415992
... 4	9.958550	9.013425	8.703142	3.588733

>>> float_col = df.select_dtypes(include=['float64']) # This will select float columns only
>>> # list(float_col.columns.values)

>>> for col in float_col.columns.values:
...     df[col] = df[col].astype('int64')

>>> print(df)
... 	P	Q	R	S
... 0	4	0	8	1
... 1	0	9	6	3
... 2	2	0	5	3
... 3	3	8	6	1
... 4	9	9	8	3

Solution 5 - Python

This is a quick solution in case you want to convert more columns of your pandas.DataFrame from float to integer considering also the case that you can have NaN values.

cols = ['col_1', 'col_2', 'col_3', 'col_4']
for col in cols:
   df[col] = df[col].apply(lambda x: int(x) if x == x else "")

I tried with else x) and else None), but the result is still having the float number, so I used else "".

Solution 6 - Python

Expanding on @Ryan G mentioned usage of the pandas.DataFrame.astype(<type>) method, one can use the errors=ignore argument to only convert those columns that do not produce an error, which notably simplifies the syntax. Obviously, caution should be applied when ignoring errors, but for this task it comes very handy.

>>> df = pd.DataFrame(np.random.rand(3, 4), columns=list('ABCD'))
>>> df *= 10
>>> print(df)
...           A       B       C       D
... 0   2.16861 8.34139 1.83434 6.91706
... 1   5.85938 9.71712 5.53371 4.26542
... 2   0.50112 4.06725 1.99795 4.75698

>>> df['E'] = list('XYZ')
>>> df.astype(int, errors='ignore')
>>> print(df)
...     A   B   C   D   E
... 0   2   8   1   6   X
... 1   5   9   5   4   Y
... 2   0   4   1   4   Z

From pandas.DataFrame.astype docs: > errors : {‘raise’, ‘ignore’}, default ‘raise’ > > Control raising of exceptions on invalid data for provided dtype. > > - raise : allow exceptions to be raised > - ignore : suppress exceptions. On error return original object > > New in version 0.20.0.

Solution 7 - Python

The columns that needs to be converted to int can be mentioned in a dictionary also as below

df = df.astype({'col1': 'int', 'col2': 'int', 'col3': 'int'})

Solution 8 - Python

>>> import pandas as pd
>>> right = pd.DataFrame({'C': [1.002, 2.003], 'D': [1.009, 4.55], 'key': ['K0', 'K1']})
>>> print(right)
           C      D key
    0  1.002  1.009  K0
    1  2.003  4.550  K1
>>> right['C'] = right.C.astype(int)
>>> print(right)
       C      D key
    0  1  1.009  K0
    1  2  4.550  K1

Solution 9 - Python

Use 'Int64' for NaN support

  • astype(int) and astype('int64') cannot handle missing values (numpy int)
  • astype('Int64') can handle missing values (pandas int)
df['A'] = df['A'].astype('Int64') # capital I

This assumes you want to keep missing values as NaN. If you plan to impute them, you could fillna first as Ryan suggested.


Examples of 'Int64' (capital I)

  1. If the floats are already rounded, just use astype:

    df = pd.DataFrame({'A': [99.0, np.nan, 42.0]})
    
    df['A'] = df['A'].astype('Int64')
    #       A
    # 0    99
    # 1  <NA>
    # 2    42
    
  2. If the floats are not rounded yet, round before astype:

    df = pd.DataFrame({'A': [3.14159, np.nan, 1.61803]})
    
    df['A'] = df['A'].round().astype('Int64')
    #       A
    # 0     3
    # 1  <NA>
    # 2     2
    
  3. To read int+NaN data from a file, use dtype='Int64' to avoid the need for converting at all:

    csv = io.StringIO('''
    id,rating
    foo,5
    bar,
    baz,2
    ''')
    
    df = pd.read_csv(csv, dtype={'rating': 'Int64'})
    #     id  rating
    # 0  foo       5
    # 1  bar    <NA>
    # 2  baz       2
    

Notes

  • 'Int64' is an alias for Int64Dtype:

    df['A'] = df['A'].astype(pd.Int64Dtype()) # same as astype('Int64')
    
  • Sized/signed aliases are available:

    lower bound upper bound
    'Int8' -128 127
    'Int16' -32,768 32,767
    'Int32' -2,147,483,648 2,147,483,647
    'Int64' -9,223,372,036,854,775,808 9,223,372,036,854,775,807
    'UInt8' 0 255
    'UInt16' 0 65,535
    'UInt32' 0 4,294,967,295
    'UInt64' 0 18,446,744,073,709,551,615

Solution 10 - Python

In the text of the question is explained that the data comes from a csv. Só, I think that show options to make the conversion when the data is read and not after are relevant to the topic.

When importing spreadsheets or csv in a dataframe, "only integer columns" are commonly converted to float because excel stores all numerical values as floats and how the underlying libraries works.

When the file is read with read_excel or read_csv there are a couple of options avoid the after import conversion:

  • parameter dtype allows a pass a dictionary of column names and target types like dtype = {"my_column": "Int64"}
  • parameter converters can be used to pass a function that makes the conversion, for example changing NaN's with 0. converters = {"my_column": lambda x: int(x) if x else 0}
  • parameter convert_float will convert "integral floats to int (i.e., 1.0 –> 1)", but take care with corner cases like NaN's. This parameter is only available in read_excel

To make the conversion in an existing dataframe several alternatives have been given in other comments, but since v1.0.0 pandas has a interesting function for this cases: convert_dtypes, that "Convert columns to best possible dtypes using dtypes supporting pd.NA."

As example:

In [3]: import numpy as np                                                                                                                                                                                         

In [4]: import pandas as pd                                                                                                                                                                                        

In [5]: df = pd.DataFrame( 
   ...:     { 
   ...:         "a": pd.Series([1, 2, 3], dtype=np.dtype("int64")), 
   ...:         "b": pd.Series([1.0, 2.0, 3.0], dtype=np.dtype("float")), 
   ...:         "c": pd.Series([1.0, np.nan, 3.0]), 
   ...:         "d": pd.Series([1, np.nan, 3]), 
   ...:     } 
   ...: )                                                                                                                                                                                                          

In [6]: df                                                                                                                                                                                                         
Out[6]: 
   a    b    c    d
0  1  1.0  1.0  1.0
1  2  2.0  NaN  NaN
2  3  3.0  3.0  3.0

In [7]: df.dtypes                                                                                                                                                                                                  
Out[7]: 
a      int64
b    float64
c    float64
d    float64
dtype: object

In [8]: converted = df.convert_dtypes()                                                                                                                                                                            

In [9]: converted.dtypes                                                                                                                                                                                           
Out[9]: 
a    Int64
b    Int64
c    Int64
d    Int64
dtype: object

In [10]: converted                                                                                                                                                                                                 
Out[10]: 
   a  b     c     d
0  1  1     1     1
1  2  2  <NA>  <NA>
2  3  3     3     3

Solution 11 - Python

Although there are many options here, You can also convert the format of specific columns using a dictionary

Data = pd.read_csv('Your_Data.csv')

Data_2 = Data.astype({"Column a":"int32", "Column_b": "float64", "Column_c": "int32"})

print(Data_2 .dtypes) # Check the dtypes of the columns

This is an useful and very fast way to change the data format of specific columns for quick data analysis.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMJPView Question on Stackoverflow
Solution 1 - PythonEdChumView Answer on Stackoverflow
Solution 2 - PythonRyan GView Answer on Stackoverflow
Solution 3 - Pythonuser4322543View Answer on Stackoverflow
Solution 4 - PythonSuhas_PoteView Answer on Stackoverflow
Solution 5 - PythonenriView Answer on Stackoverflow
Solution 6 - PythonaebmadView Answer on Stackoverflow
Solution 7 - PythonprashanthView Answer on Stackoverflow
Solution 8 - Pythonuser8051244View Answer on Stackoverflow
Solution 9 - PythontdyView Answer on Stackoverflow
Solution 10 - PythonFrancisco PugaView Answer on Stackoverflow
Solution 11 - PythonFellipe AlcantaraView Answer on Stackoverflow