How do I combine two dataframes?
PythonPandasPython Problem Overview
I'm using Pandas data frames. I have a initial data frame, say D
. I extract two data frames from it like this:
A = D[D.label == k]
B = D[D.label != k]
I want to combine A
and B
so I can have them as one DataFrame, something like a union operation. The order of the data is not important. However, when we sample A
and B
from D
, they retain their indexes from D
.
Python Solutions
Solution 1 - Python
> Deprecation Notice: DataFrame.append
and Series.append
were deprecated in v1.4.0
I believe you can use the append
method
bigdata = data1.append(data2, ignore_index=True)
to keep their indexes just don't use the ignore_index
keyword...
Solution 2 - Python
You can also use pd.concat
, which is particularly helpful when you are joining more than two dataframes:
bigdata = pd.concat([data1, data2], ignore_index=True, sort=False)
Solution 3 - Python
Thought to add this here in case someone finds it useful. @ostrokach already mentioned how you can merge the data frames across rows which is
df_row_merged = pd.concat([df_a, df_b], ignore_index=True)
To merge across columns, you can use the following syntax:
df_col_merged = pd.concat([df_a, df_b], axis=1)
Solution 4 - Python
If you're working with big data and need to concatenate multiple datasets calling concat
many times can get performance-intensive.
If you don't want to create a new df each time, you can instead aggregate the changes and call concat
only once:
frames = [df_A, df_B] # Or perform operations on the DFs
result = pd.concat(frames)
This is pointed out in the pandas docs under concatenating objects at the bottom of the section):
> Note: It is worth noting however, that concat
(and therefore append
)
> makes a full copy of the data, and that constantly reusing this
> function can create a significant performance hit. If you need to use
> the operation over several datasets, use a list comprehension.
Solution 5 - Python
If you want to update/replace the values of first dataframe df1
with the values of second dataframe df2
. you can do it by following steps —
Step 1: Set index of the first dataframe (df1)
df1.set_index('id')
Step 2: Set index of the second dataframe (df2)
df2.set_index('id')
and finally update the dataframe using the following snippet —
df1.update(df2)