Fitting polynomials to data

MathStatistics

Math Problem Overview


Is there a way, given a set of values (x,f(x)), to find the polynomial of a given degree that best fits the data?

I know http://en.wikipedia.org/wiki/Polynomial_interpolation">polynomial interpolation, which is for finding a polynomial of degree n given n+1 data points, but here there are a large number of values and we want to find a low-degree polynomial (find best linear fit, best quadratic, best cubic, etc.). It might be related to http://en.wikipedia.org/wiki/Least_squares">least squares...

More generally, I would like to know the answer when we have a multivariate function -- points like (x,y,f(x,y)), say -- and want to find the best polynomial (p(x,y)) of a given degree in the variables. (Specifically a polynomial, not splines or Fourier series.)

Both theory and code/libraries (preferably in Python, but any language is okay) would be useful.

Math Solutions


Solution 1 - Math

Thanks for everyone's replies. Here is another attempt at summarizing them. Pardon if I say too many "obvious" things: I knew nothing about least squares before, so everything was new to me.

NOT polynomial interpolation

http://en.wikipedia.org/wiki/Polynomial_interpolation">Polynomial interpolation is fitting a polynomial of degree n given n+1 data points, e.g. finding a cubic that passes exactly through four given points. As said in the question, this was not want I wanted—I had a lot of points and wanted a small-degree polynomial (which will only approximately fit, unless we've been lucky)—but since some of the answers insisted on talking about it, I should mention them :) http://en.wikipedia.org/wiki/Lagrange_polynomial">Lagrange polynomial, http://en.wikipedia.org/wiki/Vandermonde_matrix">Vandermonde matrix, etc.

What is least-squares?

"Least squares" is a particular definition/criterion/"metric" of "how well" a polynomial fits. (There are others, but this is simplest.) Say you are trying to fit a polynomial

p(x,y) = a + bx + cy + dx2 + ey2 + fxy
to some given data points (xi,yi,Zi) (where "Zi" was "f(xi,yi)" in the question). With least-squares the problem is to find the "best" coefficients (a,b,c,d,e,f), such that what is minimized (kept "least") is the "sum of squared residuals", namely

S = ∑i (a + bxi + cyi + dxi2 + eyi2 + fxiyi - Zi)2

Theory

The important idea is that if you look at S as a function of (a,b,c,d,e,f), then S is http://en.wikipedia.org/wiki/Maxima_and_minima">minimized</a> at a point at which its http://en.wikipedia.org/wiki/Partial_derivative">gradient</a> http://en.wikipedia.org/wiki/Critical_point_(mathematics)">is 0. This means that for example ∂S/∂f=0, i.e. that

i2(a + … + fxiyi - Zi)xiyi = 0

and similar equations for a, b, c, d, e. Note that these are just linear equations in a…f. So we can solve them with http://en.wikipedia.org/wiki/Gaussian_elimination">Gaussian elimination or any of http://en.wikipedia.org/wiki/System_of_linear_equations#Other_methods">the usual methods.

This is still called "linear least squares", because although the function we wanted was a quadratic polynomial, it is still linear in the parameters (a,b,c,d,e,f). Note that the same thing works when we want p(x,y) to be any "linear combination" of arbitrary functions fj, instead of just a polynomial (= "linear combination of monomials").

Code

For the univariate case (when there is only variable x — the fj are monomials xj), there is Numpy's http://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html">`polyfit`</a>;:

>>> import numpy
>>> xs = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> ys = [1.1, 3.9, 11.2, 21.5, 34.8, 51, 70.2, 92.3, 117.4, 145.5]
>>> p = numpy.poly1d(numpy.polyfit(xs, ys, deg=2))
>>> print p
       2
1.517 x + 2.483 x + 0.4927

For the multivariate case, or linear least squares in general, there is SciPy. http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html#solving-linear-least-squares-problems-and-pseudo-inverses">As explained in its documentation, it takes a matrix A of the values fj(xi). (The theory is that it finds the http://en.wikipedia.org/wiki/Moore-Penrose_pseudoinverse#Applications">Moore-Penrose pseudoinverse of A.) With our above example involving (xi,yi,Zi), fitting a polynomial means the fj are the monomials x()y(). The following finds the best quadratic (or best polynomial of any other degree, if you change the "degree = 2" line):

from scipy import linalg
import random

n = 20
x = [100*random.random() for i in range(n)]
y = [100*random.random() for i in range(n)]
Z = [(x[i]+y[i])**2 + 0.01*random.random() for i in range(n)]

degree = 2
A = []
for i in range(n):
    A.append([])
    for xd in range(degree+1):
        for yd in range(degree+1-xd):
            A[i].append((x[i]**xd)*(y[i]**yd)) #f_j(x_i)

c,_,_,_ = linalg.lstsq(A,Z)
j = 0
for xd in range(0,degree+1):
    for yd in range(0,degree+1-xd):
        print " + (%.2f)x^%dy^%d" % (c[j], xd, yd),
        j += 1

prints

 + (0.01)x^0y^0  + (-0.00)x^0y^1  + (1.00)x^0y^2  + (-0.00)x^1y^0  + (2.00)x^1y^1  + (1.00)x^2y^0

so it has discovered that the polynomial is x2+2xy+y2+0.01. [The last term is sometimes -0.01 and sometimes 0, which is to be expected because of the random noise we added.]

Alternatives to Python+Numpy/Scipy are http://www.r-project.org/">R</a> and Computer Algebra Systems: http://sagemath.org/index.html">Sage</a>;, Mathematica, Matlab, Maple. Even Excel might be able to do it. http://numerical.recipes/oldverswitcher.html">Numerical Recipes discusses methods to implement it ourselves (in C, Fortran).

Concerns

  • It is strongly influenced by how the points are chosen. When I had x=y=range(20) instead of the random points, it always produced 1.33x2+1.33xy+1.33y2, which was puzzling... until I realised that because I always had x[i]=y[i], the polynomials were the same: x2+2xy+y2 = 4x2 = (4/3)(x2+xy+y2). So the moral is that it is important to choose the points carefully to get the "right" polynomial. (If you can chose, you should choose http://en.wikipedia.org/wiki/Chebyshev_nodes">Chebyshev nodes for polynomial interpolation; not sure if the same is true for least squares as well.)
  • Overfitting: higher-degree polynomials can always fit the data better. If you change the degree to 3 or 4 or 5, it still mostly recognizes the same quadratic polynomial (coefficients are 0 for higher-degree terms) but for larger degrees, it starts fitting higher-degree polynomials. But even with degree 6, taking larger n (more data points instead of 20, say 200) still fits the quadratic polynomial. So the moral is to avoid overfitting, for which it might help to take as many data points as possible.
  • There might be issues of http://en.wikipedia.org/wiki/Numerical_stability">numerical stability I don't fully understand.
  • If you don't need a polynomial, you can obtain better fits with other kinds of functions, e.g. http://en.wikipedia.org/wiki/Spline_(mathematics)">splines</a> (piecewise polynomials).

Solution 2 - Math

Yes, the way this is typically done is by using least squares. There are other ways of specifying how well a polynomial fits, but the theory is simplest for least squares. The general theory is called linear regression.

Your best bet is probably to start with Numerical Recipes.

R is free and will do everything you want and more, but it has a big learning curve.

If you have access to Mathematica, you can use the Fit function to do a least squares fit. I imagine Matlab and its open source counterpart Octave have a similar function.

Solution 3 - Math

For (x, f(x)) case:

import numpy

x = numpy.arange(10)
y = x**2

coeffs = numpy.polyfit(x, y, deg=2)
poly = numpy.poly1d(coeffs)
print poly
yp = numpy.polyval(poly, x)
print (yp-y)

Solution 4 - Math

Bare in mind that a polynomial of higher degree ALWAYS fits the data better. Polynomials of higher degree typically leads to highly improbable functions (see Occam's Razor), though (overfitting). You want to find a balance between simplicity (degree of polynomial) and fit (e.g. least square error). Quantitatively, there are tests for this, the Akaike Information Criterion or the Bayesian Information Criterion. These tests give a score which model is to be prefered.

Solution 5 - Math

If you want to fit the (xi, f(xi)) to an polynomial of degree n then you would set up a linear least squares problem with the data (1, xi, xi, xi^2, ..., xi^n, f(xi) ). This will return a set of coefficients (c0, c1, ..., cn) so that the best fitting polynomial is y = c0 + c1 * x + c2 * x^2 + ... + cn * x^n.

You can generalize this two more than one dependent variable by including powers of y and combinations of x and y in the problem.

Solution 6 - Math

Lagrange polynomials (as @j w posted) give you an exact fit at the points you specify, but with polynomials of degree more than say 5 or 6 you can run into numerical instability.

Least squares gives you the "best fit" polynomial with error defined as the sum of squares of the individual errors. (take the distance along the y-axis between the points you have and the function that results, square them, and sum them up) The MATLAB polyfit function does this, and with multiple return arguments, you can have it automatically take care of scaling/offset issues (e.g. if you have 100 points all between x=312.1 and 312.3, and you want a 6th degree polynomial, you're going to want to calculate u = (x-312.2)/0.1 so the u-values are distributed between -1 and +=).

NOTE that the results of least-squares fits are strongly influenced by the distribution of x-axis values. If the x-values are equally spaced, then you'll get larger errors at the ends. If you have a case where you can choose the x values and you care about the maximum deviation from your known function and an interpolating polynomial, then the use of Chebyshev polynomials will give you something that is close to the perfect minimax polynomial (which is very hard to calculate). This is discussed at some length in Numerical Recipes.

Edit: From what I gather, this all works well for functions of one variable. For multivariate functions it is likely to be much more difficult if the degree is more than, say, 2. I did find a reference on Google Books.

Solution 7 - Math

at college we had this book which I still find extremely useful: Conte, de Boor; elementary numerical analysis; Mc Grow Hill. The relevant paragraph is 6.2: Data Fitting.
example code comes in FORTRAN, and the listings are not very readable either, but the explanations are deep and clear at the same time. you end up understanding what you are doing, not just doing it (as is my experience of Numerical Recipes).
I usually start with Numerical Recipes but for things like this I quickly have to grab Conte-de Boor.

maybe better posting some code... it's a bit stripped down, but the most relevant parts are there. it relies on numpy, obviously!

def Tn(n, x):
  if n==0:
    return 1.0
  elif n==1:
    return float(x)
  else:
    return (2.0 * x * Tn(n - 1, x)) - Tn(n - 2, x)

class ChebyshevFit:

  def __init__(self):
    self.Tn = Memoize(Tn)

  def fit(self, data, degree=None):
    """fit the data by a 'minimal squares' linear combination of chebyshev polinomials.

    cfr: Conte, de Boor; elementary numerical analysis; Mc Grow Hill (6.2: Data Fitting)
    """

    if degree is None:
      degree = 5

    data = sorted(data)
    self.range = start, end = (min(data)[0], max(data)[0])
    self.halfwidth = (end - start) / 2.0
    vec_x = [(x - start - self.halfwidth)/self.halfwidth for (x, y) in data]
    vec_f = [y for (x, y) in data]
  
    mat_phi = [numpy.array([self.Tn(i, x) for x in vec_x]) for i in range(degree+1)]
    mat_A = numpy.inner(mat_phi, mat_phi)
    vec_b = numpy.inner(vec_f, mat_phi)

    self.coefficients = numpy.linalg.solve(mat_A, vec_b)
    self.degree = degree

  def evaluate(self, x):
    """use Clenshaw algorithm

    http://en.wikipedia.org/wiki/Clenshaw_algorithm
    """

    x = (x-self.range[0]-self.halfwidth) / self.halfwidth
    
    b_2 = float(self.coefficients[self.degree])
    b_1 = 2 * x * b_2 + float(self.coefficients[self.degree - 1])

    for i in range(2, self.degree):
      b_1, b_2 = 2.0 * x * b_1 + self.coefficients[self.degree - i] - b_2, b_1
    else:
      b_0 = x*b_1 + self.coefficients[0] - b_2

    return b_0

Solution 8 - Math

Remember, there's a big difference between approximating the polynomial and finding an exact one.

For example, if I give you 4 points, you could

  1. Approximate a line with a method like least squares
  2. Approximate a parabola with a method like least squares
  3. Find an exact cubic function through these four points.

Be sure to select the method that's right for you!

Solution 9 - Math

It's rather easy to scare up a quick fit using Excel's matrix functions if you know how to represent the least squares problem as a linear algebra problem. (That depends on how reliable you think Excel is as a linear algebra solver.)

Solution 10 - Math

The lagrange polynomial is in some sense the "simplest" interpolating polynomial that fits a given set of data points.

It is sometimes problematic because it can vary wildly between data points.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionShreevatsaRView Question on Stackoverflow
Solution 1 - MathShreevatsaRView Answer on Stackoverflow
Solution 2 - MathJohn D. CookView Answer on Stackoverflow
Solution 3 - MathjfsView Answer on Stackoverflow
Solution 4 - MathFredriku73View Answer on Stackoverflow
Solution 5 - MathDavid NormanView Answer on Stackoverflow
Solution 6 - MathJason SView Answer on Stackoverflow
Solution 7 - MathmariotomoView Answer on Stackoverflow
Solution 8 - MathstalepretzelView Answer on Stackoverflow
Solution 9 - MathduffymoView Answer on Stackoverflow
Solution 10 - Mathj wView Answer on Stackoverflow