Multivariate normal density in Python?

PythonNumpyScipyProbability

Python Problem Overview


Is there any python package that allows the efficient computation of the PDF (probability density function) of a multivariate normal distribution?

It doesn't seem to be included in Numpy/Scipy, and surprisingly a Google search didn't turn up any useful thing.

Python Solutions


Solution 1 - Python

The multivariate normal is now available on SciPy 0.14.0.dev-16fc0af:

from scipy.stats import multivariate_normal
var = multivariate_normal(mean=[0,0], cov=[[1,0],[0,1]])
var.pdf([1,0])

Solution 2 - Python

I just made one for my purposes so I though I'd share. It's built using "the powers" of numpy, on the formula of the non degenerate case from http://en.wikipedia.org/wiki/Multivariate_normal_distribution and it aso validates the input.

Here is the code along with a sample run

from numpy import *
import math
# covariance matrix
sigma = matrix([[2.3, 0, 0, 0],
           [0, 1.5, 0, 0],
           [0, 0, 1.7, 0],
           [0, 0,   0, 2]
          ])
# mean vector
mu = array([2,3,8,10])

# input
x = array([2.1,3.5,8, 9.5])

def norm_pdf_multivariate(x, mu, sigma):
    size = len(x)
    if size == len(mu) and (size, size) == sigma.shape:
        det = linalg.det(sigma)
        if det == 0:
            raise NameError("The covariance matrix can't be singular")
        
        norm_const = 1.0/ ( math.pow((2*pi),float(size)/2) * math.pow(det,1.0/2) )
        x_mu = matrix(x - mu)
        inv = sigma.I        
        result = math.pow(math.e, -0.5 * (x_mu * inv * x_mu.T))
        return norm_const * result
    else:
        raise NameError("The dimensions of the input don't match")

print norm_pdf_multivariate(x, mu, sigma)

Solution 3 - Python

If still needed, my implementation would be

import numpy as np

def pdf_multivariate_gauss(x, mu, cov):
    '''
    Caculate the multivariate normal density (pdf)
    
    Keyword arguments:
        x = numpy array of a "d x 1" sample vector
        mu = numpy array of a "d x 1" mean vector
        cov = "numpy array of a d x d" covariance matrix
    '''
    assert(mu.shape[0] > mu.shape[1]), 'mu must be a row vector'
    assert(x.shape[0] > x.shape[1]), 'x must be a row vector'
    assert(cov.shape[0] == cov.shape[1]), 'covariance matrix must be square'
    assert(mu.shape[0] == cov.shape[0]), 'cov_mat and mu_vec must have the same dimensions'
    assert(mu.shape[0] == x.shape[0]), 'mu and x must have the same dimensions'
    part1 = 1 / ( ((2* np.pi)**(len(mu)/2)) * (np.linalg.det(cov)**(1/2)) )
    part2 = (-1/2) * ((x-mu).T.dot(np.linalg.inv(cov))).dot((x-mu))
    return float(part1 * np.exp(part2))

def test_gauss_pdf():
    x = np.array([[0],[0]])
    mu  = np.array([[0],[0]])
    cov = np.eye(2) 

    print(pdf_multivariate_gauss(x, mu, cov))

	# prints 0.15915494309189535

if __name__ == '__main__':
    test_gauss_pdf()

In case I make future changes, the code is here on GitHub

Solution 4 - Python

In the common case of a diagonal covariance matrix, the multivariate PDF can be obtained by simply multiplying the univariate PDF values returned by a scipy.stats.norm instance. If you need the general case, you will probably have to code this yourself (which shouldn't be hard).

Solution 5 - Python

You can easily compute using numpy. I have implemented as below for the purpose of machine learning course and would like to share, hope it helps to someone.

import numpy as np
X = np.array([[13.04681517, 14.74115241],[13.40852019, 13.7632696 ],[14.19591481, 15.85318113],[14.91470077, 16.17425987]])

def est_gaus_par(X):
    mu = np.mean(X,axis=0)
    sig = np.std(X,axis=0)
    return mu,sig

mu,sigma = est_gaus_par(X)

def est_mult_gaus(X,mu,sigma):
    m = len(mu)
    sigma2 = np.diag(sigma)
    X = X-mu.T
    p = 1/((2*np.pi)**(m/2)*np.linalg.det(sigma2)**(0.5))*np.exp(-0.5*np.sum(X.dot(np.linalg.pinv(sigma2))*X,axis=1))

    return p

p = est_mult_gaus(X, mu, sigma)

Solution 6 - Python

I know of several python packages that use it internally, with different generality and for different uses, but I don't know if any of them are intended for users.

statsmodels, for example, has the following hidden function and class, but it's not used by statsmodels:

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/miscmodels/try_mlecov.py#L36

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/mv_normal.py#L777

Essentially, if you need fast evaluation, rewrite it for your use case.

Solution 7 - Python

I use the following code which calculates the logpdf value, which is preferable for larger dimensions. It also works for scipy.sparse matrices.

import numpy as np
import math
import scipy.sparse as sp
import scipy.sparse.linalg as spln

def lognormpdf(x,mu,S):
    """ Calculate gaussian probability density of x, when x ~ N(mu,sigma) """
    nx = len(S)
    norm_coeff = nx*math.log(2*math.pi)+np.linalg.slogdet(S)[1]

    err = x-mu
    if (sp.issparse(S)):
        numerator = spln.spsolve(S, err).T.dot(err)
    else:
        numerator = np.linalg.solve(S, err).T.dot(err)

    return -0.5*(norm_coeff+numerator)

Code is from pyParticleEst, if you want the pdf value instead of the logpdf just take math.exp() on the returned value

Solution 8 - Python

The density can be computed in a pretty straightforward way using numpy functions and the formula on this page: http://en.wikipedia.org/wiki/Multivariate_normal_distribution. You may also want to use the likelihood function (log probability), which is less likely to underflow for large dimensions and is a little more straightforward to compute. Both just involve being able to compute the determinant and inverse of a matrix.

The CDF, on the other hand, is an entirely different animal...

Solution 9 - Python

The following code helped me to solve,when given a vector what is the likelihood that vector is in a multivariate normal distribution.

import numpy as np
from scipy.stats import multivariate_normal

data with all vectors

d= np.array([[1,2,1],[2,1,3],[4,5,4],[2,2,1]])

mean of the data in vector form, which will have same length as input vector(here its 3)

mean = sum(d,axis=0)/len(d)

OR
mean=np.average(d , axis=0)
mean.shape

finding covarianve of vectors which will have shape of [input vector shape X input vector shape] here it is 3x3

cov = 0
for e in d:
  cov += np.dot((e-mean).reshape(len(e), 1), (e-mean).reshape(1, len(e)))
cov /= len(d)
cov.shape

preparing a multivariate Gaussian distribution from mean and co variance

dist = multivariate_normal(mean,cov)

finding probability distribution function.

print(dist.pdf([1,2,3]))

3.050863384798471e-05

The above value gives the likelihood.

Solution 10 - Python

Here I elaborate a bit more on how exactly to use the multivariate_normal() from the scipy package:

# Import packages
import numpy as np
from scipy.stats import multivariate_normal

# Prepare your data
x = np.linspace(-10, 10, 500)
y = np.linspace(-10, 10, 500)
X, Y = np.meshgrid(x,y)

# Get the multivariate normal distribution
mu_x = np.mean(x)
sigma_x = np.std(x)
mu_y = np.mean(y)
sigma_y = np.std(y)
rv = multivariate_normal([mu_x, mu_y], [[sigma_x, 0], [0, sigma_y]])

# Get the probability density
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X
pos[:, :, 1] = Y
pd = rv.pdf(pos)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBennoView Question on Stackoverflow
Solution 1 - PythonjuliohmView Answer on Stackoverflow
Solution 2 - Pythonuser692734View Answer on Stackoverflow
Solution 3 - Pythonuser2489252View Answer on Stackoverflow
Solution 4 - PythonSven MarnachView Answer on Stackoverflow
Solution 5 - PythonFreqView Answer on Stackoverflow
Solution 6 - PythonJosefView Answer on Stackoverflow
Solution 7 - PythonajnView Answer on Stackoverflow
Solution 8 - PythonAndrew MaoView Answer on Stackoverflow
Solution 9 - PythonShahadView Answer on Stackoverflow
Solution 10 - Pythontsveti_ikoView Answer on Stackoverflow