How do I combine two dataframes?

PythonPandas

Python Problem Overview


I'm using Pandas data frames. I have a initial data frame, say D. I extract two data frames from it like this:

A = D[D.label == k]
B = D[D.label != k]

I want to combine A and B so I can have them as one DataFrame, something like a union operation. The order of the data is not important. However, when we sample A and B from D, they retain their indexes from D.

Python Solutions


Solution 1 - Python

> Deprecation Notice: DataFrame.append and Series.append were deprecated in v1.4.0

I believe you can use the append method

bigdata = data1.append(data2, ignore_index=True)

to keep their indexes just don't use the ignore_index keyword...

Solution 2 - Python

You can also use pd.concat, which is particularly helpful when you are joining more than two dataframes:

bigdata = pd.concat([data1, data2], ignore_index=True, sort=False)

Solution 3 - Python

Thought to add this here in case someone finds it useful. @ostrokach already mentioned how you can merge the data frames across rows which is

df_row_merged = pd.concat([df_a, df_b], ignore_index=True)

To merge across columns, you can use the following syntax:

df_col_merged = pd.concat([df_a, df_b], axis=1)

Solution 4 - Python

If you're working with big data and need to concatenate multiple datasets calling concat many times can get performance-intensive.

If you don't want to create a new df each time, you can instead aggregate the changes and call concat only once:

frames = [df_A, df_B]  # Or perform operations on the DFs
result = pd.concat(frames)

This is pointed out in the pandas docs under concatenating objects at the bottom of the section):

> Note: It is worth noting however, that concat (and therefore append) > makes a full copy of the data, and that constantly reusing this > function can create a significant performance hit. If you need to use > the operation over several datasets, use a list comprehension.

Solution 5 - Python

If you want to update/replace the values of first dataframe df1 with the values of second dataframe df2. you can do it by following steps —

Step 1: Set index of the first dataframe (df1)

df1.set_index('id')

Step 2: Set index of the second dataframe (df2)

df2.set_index('id')

and finally update the dataframe using the following snippet —

df1.update(df2)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMKoosejView Question on Stackoverflow
Solution 1 - PythonJoran BeasleyView Answer on Stackoverflow
Solution 2 - PythonostrokachView Answer on Stackoverflow
Solution 3 - PythonpelumiView Answer on Stackoverflow
Solution 4 - Pythonmartin-martinView Answer on Stackoverflow
Solution 5 - PythonMohsin MahmoodView Answer on Stackoverflow