# How to apply a function on every row on a dataframe?

PythonFunctionPandas## Python Problem Overview

I am new to Python and I am not sure how to solve the following problem.

I have a function:

```
def EOQ(D,p,ck,ch):
Q = math.sqrt((2*D*ck)/(ch*p))
return Q
```

Say I have the dataframe

```
df = pd.DataFrame({"D": [10,20,30], "p": [20, 30, 10]})
D p
0 10 20
1 20 30
2 30 10
ch=0.2
ck=5
```

And `ch`

and `ck`

are float types. Now I want to apply the formula to every row on the dataframe and return it as an extra row 'Q'. An example (that does not work) would be:

```
df['Q']= map(lambda p, D: EOQ(D,p,ck,ch),df['p'], df['D'])
```

(returns only 'map' types)

I will need this type of processing more in my project and I hope to find something that works.

## Python Solutions

## Solution 1 - Python

The following should work:

```
def EOQ(D,p,ck,ch):
Q = math.sqrt((2*D*ck)/(ch*p))
return Q
ch=0.2
ck=5
df['Q'] = df.apply(lambda row: EOQ(row['D'], row['p'], ck, ch), axis=1)
df
```

If all you're doing is calculating the square root of some result then use the `np.sqrt`

method this is vectorised and will be significantly faster:

```
In [80]:
df['Q'] = np.sqrt((2*df['D']*ck)/(ch*df['p']))
df
Out[80]:
D p Q
0 10 20 5.000000
1 20 30 5.773503
2 30 10 12.247449
```

**Timings**

For a 30k row df:

```
In [92]:
import math
ch=0.2
ck=5
def EOQ(D,p,ck,ch):
Q = math.sqrt((2*D*ck)/(ch*p))
return Q
%timeit np.sqrt((2*df['D']*ck)/(ch*df['p']))
%timeit df.apply(lambda row: EOQ(row['D'], row['p'], ck, ch), axis=1)
1000 loops, best of 3: 622 µs per loop
1 loops, best of 3: 1.19 s per loop
```

You can see that the np method is ~1900 X faster

## Solution 2 - Python

There are few more ways to apply a function on every row of a DataFrame.

(1) You could modify `EOQ`

a bit by letting it accept a row (a Series object) as argument and access the relevant elements using the column names inside the function. Moreover, you can pass arguments to `apply`

using its keyword, e.g. `ch`

or `ck`

:

```
def EOQ1(row, ck, ch):
Q = math.sqrt((2*row['D']*ck)/(ch*row['p']))
return Q
df['Q1'] = df.apply(EOQ1, ck=ck, ch=ch, axis=1)
```

(2) It turns out that `apply`

is often slower than a list comprehension (in the benchmark below, it's 20x slower). To use a list comprehension, you could modify `EOQ`

still further so that you access elements by its index. Then call the function in a loop over `df`

rows that are converted to lists:

```
def EOQ2(row, ck, ch):
Q = math.sqrt((2*row[0]*ck)/(ch*row[1]))
return Q
df['Q2a'] = [EOQ2(x, ck, ch) for x in df[['D','p']].to_numpy().tolist()]
```

(3) As it happens, if the goal is to call a function iteratively, `map`

is usually faster than a list comprehension. So you could convert `df`

into a list, `map`

the function to it; then unpack the result in a list:

```
df['Q2b'] = [*map(EOQ2, df[['D','p']].to_numpy().tolist(), [ck]*len(df), [ch]*len(df))]
```

(4) As @EdChum notes, it's always better to use vectorized methods if it's possible to do so, instead of applying a function row by row. Pandas offers vectorized methods that rival that of numpy's. In the case of `EOQ`

for example, instead of `math.sqrt`

, you could use pandas' `pow`

method (in the benchmark below, using pandas vectorized methods is ~20% faster than using numpy):

```
df['Q_pd'] = df['D'].mul(2*ck).div(ch*df['p']).pow(0.5)
```

**Output:**

```
D p Q Q_np Q1 Q2a Q2b Q_pd
0 10 20 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
1 20 30 5.773503 5.773503 5.773503 5.773503 5.773503 5.773503
2 30 10 12.247449 12.247449 12.247449 12.247449 12.247449 12.247449
```

**Timings:**

```
df = pd.DataFrame({"D": [10,20,30], "p": [20, 30, 10]})
df = pd.concat([df]*10000)
>>> %timeit df['Q'] = df.apply(lambda row: EOQ(row['D'], row['p'], ck, ch), axis=1)
623 ms ± 22.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit df['Q1'] = df.apply(EOQ1, ck=ck, ch=ch, axis=1)
615 ms ± 39.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit df['Q2a'] = [EOQ2(x, ck, ch) for x in df[['D','p']].to_numpy().tolist()]
31.3 ms ± 479 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit df['Q2b'] = [*map(EOQ2, df[['D','p']].to_numpy().tolist(), [ck]*len(df), [ch]*len(df))]
26.9 ms ± 306 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

```
>>> %timeit df['Q_np'] = np.sqrt((2*df['D']*ck)/(ch*df['p']))
1.19 ms ± 53.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df['Q_pd'] = df['D'].mul(2*ck).div(ch*df['p']).pow(0.5)
966 µs ± 27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```