How to add an empty column to a dataframe?

PythonPandas

Python Problem Overview


What's the easiest way to add an empty column to a pandas DataFrame object? The best I've stumbled upon is something like

df['foo'] = df.apply(lambda _: '', axis=1)

Is there a less perverse method?

Python Solutions


Solution 1 - Python

If I understand correctly, assignment should fill:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
>>> df
   A  B
0  1  2
1  2  3
2  3  4
>>> df["C"] = ""
>>> df["D"] = np.nan
>>> df
   A  B C   D
0  1  2   NaN
1  2  3   NaN
2  3  4   NaN

Solution 2 - Python

To add to DSM's answer and building on this associated question, I'd split the approach into two cases:

  • Adding a single column: Just assign empty values to the new columns, e.g. df['C'] = np.nan

  • Adding multiple columns: I'd suggest using the .reindex(columns=[...]) method of pandas to add the new columns to the dataframe's column index. This also works for adding multiple new rows with .reindex(rows=[...]). Note that newer versions of Pandas (v>0.20) allow you to specify an axis keyword rather than explicitly assigning to columns or rows.

Here is an example adding multiple columns:

mydf = mydf.reindex(columns = mydf.columns.tolist() + ['newcol1','newcol2'])

or

mydf = mydf.reindex(mydf.columns.tolist() + ['newcol1','newcol2'], axis=1)  # version > 0.20.0

You can also always concatenate a new (empty) dataframe to the existing dataframe, but that doesn't feel as pythonic to me :)

Solution 3 - Python

an even simpler solution is:

df = df.reindex(columns = header_list)                

where "header_list" is a list of the headers you want to appear.

any header included in the list that is not found already in the dataframe will be added with blank cells below.

so if

header_list = ['a','b','c', 'd']

then c and d will be added as columns with blank cells

Solution 4 - Python

I like:

df['new'] = pd.Series(dtype='int')

# or use other dtypes like 'float', 'object', ...

If you have an empty dataframe, this solution makes sure that no new row containing only NaN is added.

Specifying dtype is not strictly necessary, however newer Pandas versions produce a DeprecationWarning if not specified.

Solution 5 - Python

Starting with v0.16.0, DF.assign() could be used to assign new columns (single/multiple) to a DF. These columns get inserted in alphabetical order at the end of the DF.

This becomes advantageous compared to simple assignment in cases wherein you want to perform a series of chained operations directly on the returned dataframe.

Consider the same DF sample demonstrated by @DSM:

df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
df
Out[18]:
   A  B
0  1  2
1  2  3
2  3  4

df.assign(C="",D=np.nan)
Out[21]:
   A  B C   D
0  1  2   NaN
1  2  3   NaN
2  3  4   NaN

Note that this returns a copy with all the previous columns along with the newly created ones. In order for the original DF to be modified accordingly, use it like : df = df.assign(...) as it does not support inplace operation currently.

Solution 6 - Python

if you want to add column name from a list

df=pd.DataFrame()
a=['col1','col2','col3','col4']
for i in a:
    df[i]=np.nan

Solution 7 - Python

@emunsing's answer is really cool for adding multiple columns, but I couldn't get it to work for me in python 2.7. Instead, I found this works:

mydf = mydf.reindex(columns = np.append( mydf.columns.values, ['newcol1','newcol2'])

Solution 8 - Python

One can use df.insert(index_to_insert_at, column_header, init_value) to insert new column at a specific index.

cost_tbl.insert(1, "col_name", "") 

The above statement would insert an empty Column after the first column.

Solution 9 - Python

this will also work for multiple columns:

df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
>>> df
   A  B
0  1  2
1  2  3
2  3  4

df1 = pd.DataFrame(columns=['C','D','E'])
df = df.join(df1, how="outer")

>>>df
	A	B	C	D	E
0	1	2	NaN	NaN	NaN
1	2	3	NaN	NaN	NaN
2	3	4	NaN	NaN	NaN

Then do whatever you want to do with the columns pd.Series.fillna(),pd.Series.map() etc.

Solution 10 - Python

The below code address the question "How do I add n number of empty columns to my existing dataframe". In the interest of keeping solutions to similar problems in one place, I am adding it here.

Approach 1 (to create 64 additional columns with column names from 1-64)

m = list(range(1,65,1)) 
dd=pd.DataFrame(columns=m)
df.join(dd).replace(np.nan,'') #df is the dataframe that already exists

Approach 2 (to create 64 additional columns with column names from 1-64)

df.reindex(df.columns.tolist() + list(range(1,65,1)), axis=1).replace(np.nan,'')

Solution 11 - Python

You can do

df['column'] = None #This works. This will create a new column with None type
df.column = None #This will work only when the column is already present in the dataframe 

Solution 12 - Python

If you have a list of columns that you want to be empty, you can use assign, then comprehension dict, then dict unpacking.

>>> df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
>>> nan_cols_name = ["C","D","whatever"]
>>> df.assign(**{col:np.nan for col in nan_cols_name})

   A  B   C   D  whatever
0  1  2 NaN NaN       NaN
1  2  3 NaN NaN       NaN
2  3  4 NaN NaN       NaN

You can also unpack multiple dict in a dict that you unpack if you want different values for different columns.

df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
nan_cols_name = ["C","D","whatever"]
empty_string_cols_name = ["E","F","bad column with space"]
df.assign(**{
    **{col:np.nan for col in my_empy_columns_name}, 
    **{col:"" for col in empty_string_cols_name}
            }
         )

Solution 13 - Python

Sorry for I did not explain my answer really well at beginning. There is another way to add an new column to an existing dataframe. 1st step, make a new empty data frame (with all the columns in your data frame, plus a new or few columns you want to add) called df_temp 2nd step, combine the df_temp and your data frame.

df_temp = pd.DataFrame(columns=(df_null.columns.tolist() + ['empty']))
df = pd.concat([df_temp, df])

It might be the best solution, but it is another way to think about this question.

the reason of I am using this method is because I am get this warning all the time:

: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["empty1"], df["empty2"] = [np.nan, ""]

great I found the way to disable the Warning

pd.options.mode.chained_assignment = None 

Solution 14 - Python

The reason I was looking for such a solution is simply to add spaces between multiple DFs which have been joined column-wise using the pd.concat function and then written to excel using xlsxwriter.

df[' ']=df.apply(lambda _: '', axis=1)
df_2 = pd.concat([df,df1],axis=1)                #worked but only once. 
# Note: df & df1 have the same rows which is my index. 
#
df_2[' ']=df_2.apply(lambda _: '', axis=1)       #didn't work this time !!?     
df_4 = pd.concat([df_2,df_3],axis=1)

I then replaced the second lambda call with

df_2['']=''                                 #which appears to add a blank column
df_4 = pd.concat([df_2,df_3],axis=1)

The output I tested it on was using xlsxwriter to excel. Jupyter blank columns look the same as in excel although doesnt have xlsx formatting. Not sure why the second Lambda call didnt work.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionkjoView Question on Stackoverflow
Solution 1 - PythonDSMView Answer on Stackoverflow
Solution 2 - PythonemunsingView Answer on Stackoverflow
Solution 3 - PythonlianaView Answer on Stackoverflow
Solution 4 - PythonCarstenView Answer on Stackoverflow
Solution 5 - PythonNickil MaveliView Answer on Stackoverflow
Solution 6 - PythonJoy MazumderView Answer on Stackoverflow
Solution 7 - Pythonedge-caseView Answer on Stackoverflow
Solution 8 - PythonUsman AhmadView Answer on Stackoverflow
Solution 9 - PythonTalisView Answer on Stackoverflow
Solution 10 - PythonmoysView Answer on Stackoverflow
Solution 11 - PythonBharath_RajaView Answer on Stackoverflow
Solution 12 - PythonAdrien PacificoView Answer on Stackoverflow
Solution 13 - Python吴思位View Answer on Stackoverflow
Solution 14 - PythonJP1View Answer on Stackoverflow