Applying function with multiple arguments to create a new pandas column

PythonPandas

Python Problem Overview


I want to create a new column in a pandas data frame by applying a function to two existing columns. Following this answer I've been able to create a new column when I only need one column as an argument:

import pandas as pd
df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})

def fx(x):
    return x * x

print(df)
df['newcolumn'] = df.A.apply(fx)
print(df)

However, I cannot figure out how to do the same thing when the function requires multiple arguments. For example, how do I create a new column by passing column A and column B to the function below?

def fxy(x, y):
    return x * y

Python Solutions


Solution 1 - Python

You can go with @greenAfrican example, if it's possible for you to rewrite your function. But if you don't want to rewrite your function, you can wrap it into anonymous function inside apply, like this:

>>> def fxy(x, y):
...     return x * y

>>> df['newcolumn'] = df.apply(lambda x: fxy(x['A'], x['B']), axis=1)
>>> df
    A   B  newcolumn
0  10  20        200
1  20  30        600
2  30  10        300

Solution 2 - Python

Alternatively, you can use numpy underlying function:

>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
>>> df['new_column'] = np.multiply(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         300

or vectorize arbitrary function in general case:

>>> def fx(x, y):
...     return x*y
...
>>> df['new_column'] = np.vectorize(fx)(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         300

Solution 3 - Python

This solves the problem:

df['newcolumn'] = df.A * df.B

You could also do:

def fab(row):
  return row['A'] * row['B']
    
df['newcolumn'] = df.apply(fab, axis=1)

Solution 4 - Python

If you need to create multiple columns at once:

  1. Create the dataframe:

     import pandas as pd
     df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
    
  2. Create the function:

     def fab(row):                                                  
         return row['A'] * row['B'], row['A'] + row['B']
    
  3. Assign the new columns:

     df['newcolumn'], df['newcolumn2'] = zip(*df.apply(fab, axis=1))
    

Solution 5 - Python

One more dict style clean syntax:

df["new_column"] = df.apply(lambda x: x["A"] * x["B"], axis = 1)

or,

df["new_column"] = df["A"] * df["B"]

Solution 6 - Python

This will dynamically give you desired result. It works even if you have more than two arguments

df['anothercolumn'] = df[['A', 'B']].apply(lambda x: fxy(*x), axis=1)
print(df)


    A   B  newcolumn  anothercolumn
0  10  20        100            200
1  20  30        400            600
2  30  10        900            300

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMichaelView Question on Stackoverflow
Solution 1 - PythonRoman PekarView Answer on Stackoverflow
Solution 2 - PythonalkoView Answer on Stackoverflow
Solution 3 - PythongreenafricanView Answer on Stackoverflow
Solution 4 - Pythontoto_ticoView Answer on Stackoverflow
Solution 5 - PythonSuryaView Answer on Stackoverflow
Solution 6 - Pythonuser18475123View Answer on Stackoverflow