I want to multiply two columns in a pandas DataFrame and add the result into a new column

PythonPython 2.7Pandas

Python Problem Overview


I'm trying to multiply two existing columns in a pandas Dataframe (orders_df) - Prices (stock close price) and Amount (stock quantities) and add the calculation to a new column called 'Value'. For some reason when I run this code, all the rows under the 'Value' column are positive numbers, while some of the rows should be negative. Under the Action column in the DataFrame there are seven rows with the 'Sell' string and seven with the 'Buy' string.

for i in orders_df.Action:
 if i  == 'Sell':
  orders_df['Value'] = orders_df.Prices*orders_df.Amount
 elif i == 'Buy':
  orders_df['Value'] = -orders_df.Prices*orders_df.Amount)

Please let me know what i'm doing wrong !

Python Solutions


Solution 1 - Python

I think an elegant solution is to use the where method (also see the API docs):

In [37]: values = df.Prices * df.Amount

In [38]: df['Values'] = values.where(df.Action == 'Sell', other=-values)

In [39]: df
Out[39]: 
   Prices  Amount Action  Values
0       3      57   Sell     171
1      89      42   Sell    3738
2      45      70    Buy   -3150
3       6      43   Sell     258
4      60      47   Sell    2820
5      19      16    Buy    -304
6      56      89   Sell    4984
7       3      28    Buy     -84
8      56      69   Sell    3864
9      90      49    Buy   -4410

Further more this should be the fastest solution.

Solution 2 - Python

You can use the DataFrame apply method:

order_df['Value'] = order_df.apply(lambda row: (row['Prices']*row['Amount']
                                               if row['Action']=='Sell'
                                               else -row['Prices']*row['Amount']),
                                   axis=1)

It is usually faster to use these methods rather than over for loops.

Solution 3 - Python

If we're willing to sacrifice the succinctness of Hayden's solution, one could also do something like this:

In [22]: orders_df['C'] = orders_df.Action.apply(
               lambda x: (1 if x == 'Sell' else -1))

In [23]: orders_df   # New column C represents the sign of the transaction
Out[23]:
   Prices  Amount Action  C
0       3      57   Sell  1
1      89      42   Sell  1
2      45      70    Buy -1
3       6      43   Sell  1
4      60      47   Sell  1
5      19      16    Buy -1
6      56      89   Sell  1
7       3      28    Buy -1
8      56      69   Sell  1
9      90      49    Buy -1

Now we have eliminated the need for the if statement. Using DataFrame.apply(), we also do away with the for loop. As Hayden noted, vectorized operations are always faster.

In [24]: orders_df['Value'] = orders_df.Prices * orders_df.Amount * orders_df.C

In [25]: orders_df   # The resulting dataframe
Out[25]:
   Prices  Amount Action  C  Value
0       3      57   Sell  1    171
1      89      42   Sell  1   3738
2      45      70    Buy -1  -3150
3       6      43   Sell  1    258
4      60      47   Sell  1   2820
5      19      16    Buy -1   -304
6      56      89   Sell  1   4984
7       3      28    Buy -1    -84
8      56      69   Sell  1   3864
9      90      49    Buy -1  -4410

This solution takes two lines of code instead of one, but is a bit easier to read. I suspect that the computational costs are similar as well.

Solution 4 - Python

Since this question came up again, I think a good clean approach is using assign.

The code is quite expressive and self-describing:

df = df.assign(Value = lambda x: x.Prices * x.Amount * x.Action.replace({'Buy' : 1, 'Sell' : -1}))

Solution 5 - Python

To make things neat, I take Hayden's solution but make a small function out of it.

def create_value(row):
    if row['Action'] == 'Sell':
        return row['Prices'] * row['Amount']
    else:
        return -row['Prices']*row['Amount']

so that when we want to apply the function to our dataframe, we can do..

df['Value'] = df.apply(lambda row: create_value(row), axis=1)

...and any modifications only need to occur in the small function itself.

Concise, Readable, and Neat!

Solution 6 - Python

For me, this is the clearest and most intuitive:

values = []
for action in ['Sell','Buy']:
    amounts = orders_df['Amounts'][orders_df['Action'==action]].values
    if action == 'Sell':
        prices = orders_df['Prices'][orders_df['Action'==action]].values
    else:
        prices = -1*orders_df['Prices'][orders_df['Action'==action]].values
    values += list(amounts*prices)  
orders_df['Values'] = values

The .values method returns a numpy array allowing you to easily multiply element-wise and then you can cumulatively generate a list by 'adding' to it.

Solution 7 - Python

Good solution from bmu. I think it's more readable to put the values inside the parentheses vs outside.

    df['Values'] = np.where(df.Action == 'Sell', 
                            df.Prices*df.Amount, 
                           -df.Prices*df.Amount)

Using some pandas built in functions.

    df['Values'] = np.where(df.Action.eq('Sell'), 
                            df.Prices.mul(df.Amount), 
                           -df.Prices.mul(df.Amount))

Solution 8 - Python

First, multiply the columns Prices and Amount. Afterwards use mask to negate the values if the condition is True:

df.assign(
    Values=(df["Prices"] * df["Amount"]).mask(df["Action"] == "Buy", lambda x: -x)
)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionOAKView Question on Stackoverflow
Solution 1 - PythonbmuView Answer on Stackoverflow
Solution 2 - PythonAndy HaydenView Answer on Stackoverflow
Solution 3 - PythonAmanView Answer on Stackoverflow
Solution 4 - PythonFLabView Answer on Stackoverflow
Solution 5 - PythonJenobiView Answer on Stackoverflow
Solution 6 - PythonMichael SilversteinView Answer on Stackoverflow
Solution 7 - PythonChristopher MatthewsView Answer on Stackoverflow
Solution 8 - PythonrachwaView Answer on Stackoverflow