Calculating the area under a curve given a set of coordinates, without knowing the function

PythonNumpyScipyArea

Python Problem Overview


I have one list of 100 numbers as height for Y axis, and as length for X axis: 1 to 100 with a constant step of 5. I need to calculate the Area that it is included by the curve of the (x,y) points, and the X axis, using rectangles and Scipy. Do I have to find the function of this curve? or not? ... almost all the examples I have read are about a specific equation for the Y axis. In my case there is no equation, just data from a list. The classic solution is to add or the Y points and multiple by the step X distance... using Scipy any idea?

Please, can anyone recommend any book which focusing on numerical (finite elementary) methods, using Scipy and Numpy? ...

Python Solutions


Solution 1 - Python

The numpy and scipy libraries include the composite trapezoidal (numpy.trapz) and Simpson's (scipy.integrate.simps) rules.

Here's a simple example. In both trapz and simps, the argument dx=5 indicates that the spacing of the data along the x axis is 5 units.

import numpy as np
from scipy.integrate import simps
from numpy import trapz


# The y values.  A numpy array is used here,
# but a python list could also be used.
y = np.array([5, 20, 4, 18, 19, 18, 7, 4])

# Compute the area using the composite trapezoidal rule.
area = trapz(y, dx=5)
print("area =", area)

# Compute the area using the composite Simpson's rule.
area = simps(y, dx=5)
print("area =", area)

Output:

area = 452.5
area = 460.0

Solution 2 - Python

You can use Simpsons rule or the Trapezium rule to calculate the area under a graph given a table of y-values at a regular interval.

Python script that calculates Simpsons rule:

def integrate(y_vals, h):
	i = 1
	total = y_vals[0] + y_vals[-1]
	for y in y_vals[1:-1]:
		if i % 2 == 0:
			total += 2 * y
		else:
			total += 4 * y
		i += 1
	return total * (h / 3.0)

h is the offset (or gap) between y values, and y_vals is an array of well, y values.

Example (In same file as above function):

y_values = [13, 45.3, 12, 1, 476, 0]
interval = 1.2
area = integrate(y_values, interval)
print("The area is", area)

Solution 3 - Python

If you have sklearn installed, a simple alternative is to use sklearn.metrics.auc

This computes the area under the curve using the trapezoidal rule given arbitrary x, and y array

import numpy as np
from sklearn.metrics import auc

dx = 5
xx = np.arange(1,100,dx)
yy = np.arange(1,100,dx)

print('computed AUC using sklearn.metrics.auc: {}'.format(auc(xx,yy)))
print('computed AUC using np.trapz: {}'.format(np.trapz(yy, dx = dx)))

both output the same area: 4607.5

the advantage of sklearn.metrics.auc is that it can accept arbitrarily-spaced 'x' array, just make sure it is ascending otherwise the results will be incorrect

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser1640255View Question on Stackoverflow
Solution 1 - PythonWarren WeckesserView Answer on Stackoverflow
Solution 2 - PythonWill RichardsonView Answer on Stackoverflow
Solution 3 - Pythonkhuang834View Answer on Stackoverflow