Insert a row to pandas dataframe

PythonPandasDataframeInsert

Python Problem Overview


I have a dataframe:

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])

   A  B  C
0  5  6  7
1  7  8  9

[2 rows x 3 columns]

and I need to add a first row [2, 3, 4] to get:

   A  B  C
0  2  3  4
1  5  6  7
2  7  8  9

I've tried append() and concat() functions but can't find the right way how to do that.

How to add/insert series to dataframe?

Python Solutions


Solution 1 - Python

Just assign row to a particular index, using loc:

 df.loc[-1] = [2, 3, 4]  # adding a row
 df.index = df.index + 1  # shifting index
 df = df.sort_index()  # sorting by index

And you get, as desired:

    A  B  C
 0  2  3  4
 1  5  6  7
 2  7  8  9

See in Pandas documentation Indexing: Setting with enlargement.

Solution 2 - Python

Not sure how you were calling concat() but it should work as long as both objects are of the same type. Maybe the issue is that you need to cast your second vector to a dataframe? Using the df that you defined the following works for me:

df2 = pd.DataFrame([[2,3,4]], columns=['A','B','C'])
pd.concat([df2, df])


Solution 3 - Python

One way to achieve this is

>>> pd.DataFrame(np.array([[2, 3, 4]]), columns=['A', 'B', 'C']).append(df, ignore_index=True)
Out[330]: 
   A  B  C
0  2  3  4
1  5  6  7
2  7  8  9

Generally, it's easiest to append dataframes, not series. In your case, since you want the new row to be "on top" (with starting id), and there is no function pd.prepend(), I first create the new dataframe and then append your old one.

ignore_index will ignore the old ongoing index in your dataframe and ensure that the first row actually starts with index 1 instead of restarting with index 0.

Typical Disclaimer: Cetero censeo ... appending rows is a quite inefficient operation. If you care about performance and can somehow ensure to first create a dataframe with the correct (longer) index and then just inserting the additional row into the dataframe, you should definitely do that. See:

>>> index = np.array([0, 1, 2])
>>> df2 = pd.DataFrame(columns=['A', 'B', 'C'], index=index)
>>> df2.loc[0:1] = [list(s1), list(s2)]
>>> df2
Out[336]: 
     A    B    C
0    5    6    7
1    7    8    9
2  NaN  NaN  NaN
>>> df2 = pd.DataFrame(columns=['A', 'B', 'C'], index=index)
>>> df2.loc[1:] = [list(s1), list(s2)]

So far, we have what you had as df:

>>> df2
Out[339]: 
     A    B    C
0  NaN  NaN  NaN
1    5    6    7
2    7    8    9

But now you can easily insert the row as follows. Since the space was preallocated, this is more efficient.

>>> df2.loc[0] = np.array([2, 3, 4])
>>> df2
Out[341]: 
   A  B  C
0  2  3  4
1  5  6  7
2  7  8  9

Solution 4 - Python

Testing a few answers it is clear that using pd.concat() is more efficient for large dataframes.

Comparing the performance using dict and list, the list is more efficient, but for small dataframes, using a dict should be no problem and somewhat more readable.


1st - pd.concat() + list

%%timeit
df = pd.DataFrame(columns=['a', 'b'])
for i in range(10000):
    df = pd.concat([pd.DataFrame([[1,2]], columns=df.columns), df], ignore_index=True)

4.88 s ± 47.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2nd - pd.append() + dict

%%timeit

df = pd.DataFrame(columns=['a', 'b'])
for i in range(10000):
    df = df.append({'a': 1, 'b': 2}, ignore_index=True)

10.2 s ± 41.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

3rd - pd.DataFrame().loc + index operations

%%timeit
df = pd.DataFrame(columns=['a','b'])
for i in range(10000):
    df.loc[-1] = [1,2]
    df.index = df.index + 1
    df = df.sort_index()

17.5 s ± 37.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution 5 - Python

I put together a short function that allows for a little more flexibility when inserting a row:

def insert_row(idx, df, df_insert):
    dfA = df.iloc[:idx, ]
 	dfB = df.iloc[idx:, ]

    df = dfA.append(df_insert).append(dfB).reset_index(drop = True)

	return df

which could be further shortened to:

def insert_row(idx, df, df_insert):
    return df.iloc[:idx, ].append(df_insert).append(df.iloc[idx:, ]).reset_index(drop = True)

Then you could use something like:

df = insert_row(2, df, df_new)

where 2 is the index position in df where you want to insert df_new.

Solution 6 - Python

We can use numpy.insert. This has the advantage of flexibility. You only need to specify the index you want to insert to.

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])

pd.DataFrame(np.insert(df.values, 0, values=[2, 3, 4], axis=0))

 	0 	1 	2
0 	2 	3 	4
1 	5 	6 	7
2 	7 	8 	9

For np.insert(df.values, 0, values=[2, 3, 4], axis=0), 0 tells the function the place/index you want to place the new values.

Solution 7 - Python

It is pretty simple to add a row into a pandas DataFrame:

  1. Create a regular Python dictionary with the same columns names as your Dataframe;

  2. Use pandas.append() method and pass in the name of your dictionary, where .append() is a method on DataFrame instances;

  3. Add ignore_index=True right after your dictionary name.

Solution 8 - Python

this might seem overly simple but its incredible that a simple insert new row function isn't built in. i've read a lot about appending a new df to the original, but i'm wondering if this would be faster.

df.loc[0] = [row1data, blah...]
i = len(df) + 1
df.loc[i] = [row2data, blah...]

Solution 9 - Python

Below would be the best way to insert a row into pandas dataframe without sorting and reseting an index:

import pandas as pd

df = pd.DataFrame(columns=['a','b','c'])

def insert(df, row):
    insert_loc = df.index.max()

    if pd.isna(insert_loc):
        df.loc[0] = row
    else:
        df.loc[insert_loc + 1] = row

insert(df,[2,3,4])
insert(df,[8,9,0])
print(df)

Solution 10 - Python

concat() seems to be a bit faster than last row insertion and reindexing. In case someone would wonder about the speed of two top approaches:

In [x]: %%timeit
     ...: df = pd.DataFrame(columns=['a','b'])
     ...: for i in range(10000):
     ...:     df.loc[-1] = [1,2]
     ...:     df.index = df.index + 1
     ...:     df = df.sort_index()

17.1 s ± 705 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [y]: %%timeit
     ...: df = pd.DataFrame(columns=['a', 'b'])
     ...: for i in range(10000):
     ...:     df = pd.concat([pd.DataFrame([[1,2]], columns=df.columns), df])

6.53 s ± 127 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution 11 - Python

It just came up to me that maybe T attribute is a valid choice. Transpose, can get away from the somewhat misleading df.loc[-1] = [2, 3, 4] as @flow2k mentioned, and it is suitable for more universal situation such as you want to insert [2, 3, 4] before arbitrary row, which is hard for concat(),append() to achieve. And there's no need to bare the trouble defining and debugging a function.

a = df.T
a.insert(0,'anyName',value=[2,3,4])
# just give insert() any column name you want, we'll rename it.
a.rename(columns=dict(zip(a.columns,[i for i in range(a.shape[1])])),inplace=True)
# set inplace to a Boolean as you need.
df=a.T
df

    A	B   C
0	2	3	4
1	5	6	7
2	7	8	9

I guess this can partly explain @MattCochrane 's complaint about why pandas doesn't have a method to insert a row like insert() does.

Solution 12 - Python

You can simply append the row to the end of the DataFrame, and then adjust the index.

For instance:

df = df.append(pd.DataFrame([[2,3,4]],columns=df.columns),ignore_index=True)
df.index = (df.index + 1) % len(df)
df = df.sort_index()

Or use concat as:

df = pd.concat([pd.DataFrame([[1,2,3,4,5,6]],columns=df.columns),df],ignore_index=True)

Solution 13 - Python

Do as following example:

a_row = pd.Series([1, 2])

df = pd.DataFrame([[3, 4], [5, 6]])

row_df = pd.DataFrame([a_row])

df = pd.concat([row_df, df], ignore_index=True)

and the result is:

   0  1
0  1  2
1  3  4
2  5  6

Solution 14 - Python

Create empty df with columns name:

df = pd.DataFrame(columns = ["A", "B", "C"])

Insert new row:

df.loc[len(df.index)] = [2, 3, 4]
df.loc[len(df.index)] = [5, 6, 7]
df.loc[len(df.index)] = [7, 8, 9]

Solution 15 - Python

Give the data structure of dataframe of pandas is a list of series (each series is a column), it is convenient to insert a column at any position. So one idea I came up with is to first transpose your data frame, insert a column, and transpose it back. You may also need to rename the index (row names), like this:

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])
df = df.transpose()
df.insert(0, 2, [2,3,4])
df = df.transpose()
df.index = [i for i in range(3)]
df

	A	B	C
0	2	3	4
1	5	6	7
2	7	8	9

Solution 16 - Python

The simplest way add a row in a pandas data frame is:

DataFrame.loc[ location of insertion ]= list( )

Example :

DF.loc[ 9 ] = [ ´Pepe’ , 33, ´Japan’ ]

NB: the length of your list should match that of the data frame.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMelounView Question on Stackoverflow
Solution 1 - PythonPiotr MigdalView Answer on Stackoverflow
Solution 2 - PythonmgilbertView Answer on Stackoverflow
Solution 3 - PythonFooBarView Answer on Stackoverflow
Solution 4 - PythonkovashikawaView Answer on Stackoverflow
Solution 5 - PythonelPastorView Answer on Stackoverflow
Solution 6 - PythonTaiView Answer on Stackoverflow
Solution 7 - PythonPepeView Answer on Stackoverflow
Solution 8 - PythonAaron MelgarView Answer on Stackoverflow
Solution 9 - PythonSagar RathodView Answer on Stackoverflow
Solution 10 - PythonM. ViazView Answer on Stackoverflow
Solution 11 - PythonStevenView Answer on Stackoverflow
Solution 12 - PythonXinyi LiView Answer on Stackoverflow
Solution 13 - PythonEhsan Akbari TabarView Answer on Stackoverflow
Solution 14 - PythonAlessio PanView Answer on Stackoverflow
Solution 15 - PythonXin NiuView Answer on Stackoverflow
Solution 16 - PythonPepeView Answer on Stackoverflow