what's the inverse of the quantile function on a pandas Series?

Python Pandas Quantile

Python Problem Overview

The quantile functions gives us the quantile of a given pandas series s,

E.g.

> s.quantile(0.9) is 4.2

Is there the inverse function (i.e. cumulative distribution) which finds the value x such that

> s.quantile(x)=4

Thanks

Python Solutions

Solution 1 - Python

I had the same question as you did! I found an easy way of getting the inverse of quantile using scipy.

#libs required
from scipy import stats
import pandas as pd
import numpy as np

#generate ramdom data with same seed (to be reproducible)
np.random.seed(seed=1)
df = pd.DataFrame(np.random.uniform(0,1,(10)), columns=['a'])

#quantile function
x = df.quantile(0.5)[0]

#inverse of quantile
stats.percentileofscore(df['a'],x)

Solution 2 - Python

Sorting can be expensive, if you look for a single value I'd guess you'd be better of computing it with:

s = pd.Series(np.random.uniform(size=1000))
( s < 0.7 ).astype(int).mean() # =0.7ish

There's probably a way to avoid the int(bool) shenanigan.

Solution 3 - Python

Mathematically speaking, you're trying to find the CDF or return the probability of s being smaller than or equal to a value or quantile of q:

F(q) = Pr[s <= q]

One can use numpy and try this one-line code:

np.mean(s.to_numpy() <= q)

Solution 4 - Python

There's no 1-liner that I know of, but you can achieve this with scipy:

import pandas as pd
import numpy as np
from scipy.interpolate import interp1d

# set up a sample dataframe
df = pd.DataFrame(np.random.uniform(0,1,(11)), columns=['a'])
# sort it by the desired series and caculate the percentile
sdf = df.sort('a').reset_index()
sdf['b'] = sdf.index / float(len(sdf) - 1)
# setup the interpolator using the value as the index
interp = interp1d(sdf['a'], sdf['b'])

# a is the value, b is the percentile
>>> sdf
    index         a    b
0      10  0.030469  0.0
1       3  0.144445  0.1
2       4  0.304763  0.2
3       1  0.359589  0.3
4       7  0.385524  0.4
5       5  0.538959  0.5
6       8  0.642845  0.6
7       6  0.667710  0.7
8       9  0.733504  0.8
9       2  0.905646  0.9
10      0  0.961936  1.0

Now we can see that the two functions are inverses of each other.

>>> df['a'].quantile(0.57)
0.61167933268395969
>>> interp(0.61167933268395969)
array(0.57)
>>> interp(df['a'].quantile(0.43))
array(0.43)

interp can also take in list, a numpy array, or a pandas data series, any iterator really!

Solution 5 - Python

Just came across the same problem. Here's my two cents.

def inverse_percentile(arr, num):
    arr = sorted(arr)
    i_arr = [i for i, x in enumerate(arr) if x > num]

    return i_arr[0] / len(arr) if len(i_arr) > 0 else 1

Solution 6 - Python

% of records in s that are less than x

# Find the percentile of `x` in `s`
(s<x).mean()  # i.e., (s<x).sum()/len(s)

Alternatively, when s is sorted:

s.searchsorted(x)/len(s)

Solution 7 - Python

I use the np.searchsorted function to "find indices where elements should be inserted to maintain order":

np.random.seed(seed=1)

#we want to find the 5th 10-tile of a series of 20 elements
S = 20
N = 10
n = 5

df = pd.DataFrame(np.random.uniform(0,1,S), columns=['a'])

#quantile N function
q = df['a'].quantile(np.arange(0,N+1)/(N))

print(q)

#retrieve the ntile
x = q.iloc[n]

print('-'*30)
print(f"the {n}th {N}-tile of the series is: {x}")

#inverse
print('-'*30)
print(f"{x} is in the {np.searchsorted(q,x)}th {N}-tile of the series")

#and it works also with a value not present in the series
x=x+random.uniform(-.2,.2)
print('-'*30)
print(f"{x} is in the {np.searchsorted(q,x)}th {N}-tile of the series")

output:

0.0    0.000114
0.1    0.085843
0.2    0.145482
0.3    0.194549
0.4    0.263180
0.5    0.371164
0.6    0.417135
0.7    0.455081
0.8    0.581045
0.9    0.688730
1.0    0.878117
Name: a, dtype: float64
------------------------------
the 5th 10-tile of the series is: 0.37116410063685884
------------------------------
0.37116410063685884 is in the 5th 10-tile of the series
------------------------------
0.27693796519907005 is in the 5th 10-tile of the series

Solution 8 - Python

You can use the ECDF function from statsmodels. ECDF stands for empirical distribution function, "empirical" referring to the fact that the function it's creating is based on what is observed in your data.

Suppose you have a series s:

import pandas as pd
s = pd.Series(np.random.uniform(size=1000))

You can evaluate the CDF at 0.282:

(s <= 0.282).mean()

Or you can create the ECDF using the statsmodels function:

from statsmodels.distributions.empirical_distribution import ECDF

ecdf_s = ECDF(s)

ecdf_s

[ecdf_s(k) for k in [0.282, 0.544, 0.775]]

And check that it is the inverse of the quantiles:

s.quantile([0.25, 0.50, 0.75])

Content Type	Original Author	Original Content on Stackoverflow
Question	Mannaggia	View Question on Stackoverflow
Solution 1 - Python	fernandosjp	View Answer on Stackoverflow
Solution 2 - Python	ILoveCoding	View Answer on Stackoverflow
Solution 3 - Python	Anastasiya-Romanova 秀	View Answer on Stackoverflow
Solution 4 - Python	Mike	View Answer on Stackoverflow
Solution 5 - Python	Calvin Ku	View Answer on Stackoverflow
Solution 6 - Python	tozCSS	View Answer on Stackoverflow
Solution 7 - Python	Olmo	View Answer on Stackoverflow
Solution 8 - Python	JoAnn Alvarez	View Answer on Stackoverflow

what's the inverse of the quantile function on a pandas Series?

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

Solution 5 - Python

Solution 6 - Python

Solution 7 - Python

Solution 8 - Python

What does asm volatile do in C?

Is PyPI case sensitive?

Attributions

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

Solution 5 - Python

Solution 6 - Python

Solution 7 - Python

Solution 8 - Python

What does __asm__ __volatile__ do in C?

Is PyPI case sensitive?

Attributions

What does asm volatile do in C?