Standard Deviation in LINQ

LinqStandard Deviation

Linq Problem Overview


Does LINQ model the aggregate SQL function STDDEV() (standard deviation)?

If not, what is the simplest / best-practices way to calculate it?

Example:

  SELECT test_id, AVERAGE(result) avg, STDDEV(result) std 
    FROM tests
GROUP BY test_id

Linq Solutions


Solution 1 - Linq

You can make your own extension calculating it

public static class Extensions
{
    public static double StdDev(this IEnumerable<double> values)
    {
       double ret = 0;
       int count = values.Count();
       if (count  > 1)
       {
          //Compute the Average
          double avg = values.Average();
    
          //Perform the Sum of (value-avg)^2
          double sum = values.Sum(d => (d - avg) * (d - avg));
    
          //Put it all together
          ret = Math.Sqrt(sum / count);
       }
       return ret;
    }
}

If you have a sample of the population rather than the whole population, then you should use ret = Math.Sqrt(sum / (count - 1));.

Transformed into extension from Adding Standard Deviation to LINQ by Chris Bennett.

Solution 2 - Linq

Dynami's answer works but makes multiple passes through the data to get a result. This is a single pass method that calculates the sample standard deviation:

public static double StdDev(this IEnumerable<double> values)
{
    // ref: http://warrenseen.com/blog/2006/03/13/how-to-calculate-standard-deviation/
    double mean = 0.0;
    double sum = 0.0;
    double stdDev = 0.0;
    int n = 0;
    foreach (double val in values)
    {
        n++;
        double delta = val - mean;
        mean += delta / n;
        sum += delta * (val - mean);
    }
    if (1 < n)
        stdDev = Math.Sqrt(sum / (n - 1));

    return stdDev;
}

This is the sample standard deviation since it divides by n - 1. For the normal standard deviation you need to divide by n instead.

This uses Welford's method which has higher numerical accuracy compared to the Average(x^2)-Average(x)^2 method.

Solution 3 - Linq

This converts David Clarke's answer into an extension that follows the same form as the other aggregate LINQ functions like Average.

Usage would be: var stdev = data.StdDev(o => o.number)

public static class Extensions
{
    public static double StdDev<T>(this IEnumerable<T> list, Func<T, double> values)
    {
        // ref: https://stackoverflow.com/questions/2253874/linq-equivalent-for-standard-deviation
        // ref: http://warrenseen.com/blog/2006/03/13/how-to-calculate-standard-deviation/ 
        var mean = 0.0;
        var sum = 0.0;
        var stdDev = 0.0;
        var n = 0;
        foreach (var value in list.Select(values))
        {
            n++;
            var delta = value - mean;
            mean += delta / n;
            sum += delta * (value - mean);
        }
        if (1 < n)
            stdDev = Math.Sqrt(sum / (n - 1));

        return stdDev; 

    }
} 

Solution 4 - Linq

var stddev = Math.Sqrt(data.Average(z=>z*z)-Math.Pow(data.Average(),2));

Solution 5 - Linq

Straight to the point (and C# > 6.0), Dynamis answer becomes this:

    public static double StdDev(this IEnumerable<double> values)
    {
        var count = values?.Count() ?? 0;
        if (count <= 1) return 0;

        var avg = values.Average();
        var sum = values.Sum(d => Math.Pow(d - avg, 2));

        return Math.Sqrt(sum / count);
    }

Edit 2020-08-27:

I took @David Clarke comments to make some performance tests and this are the results:

    public static (double stdDev, double avg) StdDevFast(this List<double> values)
    {
        var count = values?.Count ?? 0;
        if (count <= 1) return (0, 0);

        var avg = GetAverage(values);
        var sum = GetSumOfSquareDiff(values, avg);

        return (Math.Sqrt(sum / count), avg);
    }

    private static double GetAverage(List<double> values)
    {
        double sum = 0.0;
        for (int i = 0; i < values.Count; i++) 
            sum += values[i];
        
        return sum / values.Count;
    }
    private static double GetSumOfSquareDiff(List<double> values, double avg)
    {
        double sum = 0.0;
        for (int i = 0; i < values.Count; i++)
        {
            var diff = values[i] - avg;
            sum += diff * diff;
        }
        return sum;
    }

I tested this with a list of one million random doubles
the original implementation had an runtime of ~48ms
the performance optimized implementation 2-3ms
so this is an significant improvement.

Some interesting details:
getting rid of Math.Pow brings a boost of 33ms!
List instead of IEnumerable 6ms
manually Average calculation 4ms
For-loops instead of ForEach-loops 2ms
Array instead of List brings just an improvement of ~2% so i skipped this
using single instead of double brings nothing

Further lowering the code and using goto (yes GOTO... haven't used this since the 90s assembler...) instead of for-loops does not pay, Thank goodness!

I have tested also parallel calculation, this makes sense on list > 200.000 items It seems that Hardware and Software needs to initialize a lot and this is for small lists contra-productive.

All tests were executed two times in a row to get rid of the warmup-time.

Solution 6 - Linq

Simple 4 lines, I used a List of doubles but one could use IEnumerable<int> values

public static double GetStandardDeviation(List<double> values)
{
	double avg = values.Average();
	double sum = values.Sum(v => (v - avg) * (v - avg));
	double denominator = values.Count - 1;
	return denominator > 0.0 ? Math.Sqrt(sum / denominator) : -1;
}

Solution 7 - Linq

public static double StdDev(this IEnumerable<int> values, bool as_sample = false)
{
    var count = values.Count();
    if (count > 0) // check for divide by zero
    // Get the mean.
    double mean = values.Sum() / count;

    // Get the sum of the squares of the differences
    // between the values and the mean.
    var squares_query =
        from int value in values
        select (value - mean) * (value - mean);
    double sum_of_squares = squares_query.Sum();
    return Math.Sqrt(sum_of_squares / (count - (as_sample ? 1 : 0)))
}

Solution 8 - Linq

In general case we want to compute StdDev in one pass: what if values is file or RDBMS cursor which can be changed between computing average and sum? We are going to have inconsistent result. The code below uses just one pass:

// Population StdDev
public static double StdDev(this IEnumerable<double> values) {
  if (null == values)
    throw new ArgumentNullException(nameof(values));

  double N = 0;
  double Sx = 0.0;
  double Sxx = 0.0;

  foreach (double x in values) {
    N += 1;
    Sx += x;
    Sxx += x * x;
  }

  return N == 0
    ? double.NaN // or throw exception
    : Math.Sqrt((Sxx - Sx * Sx / N) / N);
}

The very same idea for sample StdDev:

// Sample StdDev
public static double StdDev(this IEnumerable<double> values) {
  if (null == values)
    throw new ArgumentNullException(nameof(values));

  double N = 0;
  double Sx = 0.0;
  double Sxx = 0.0;

  foreach (double x in values) {
    N += 1;
    Sx += x;
    Sxx += x * x;
  }

  return N <= 1
    ? double.NaN // or throw exception
    : Math.Sqrt((Sxx - Sx * Sx / N) / (N - 1));
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionStevenView Question on Stackoverflow
Solution 1 - LinqDynami Le SavardView Answer on Stackoverflow
Solution 2 - LinqDavid ClarkeView Answer on Stackoverflow
Solution 3 - LinqWill MathiesView Answer on Stackoverflow
Solution 4 - LinqVitasView Answer on Stackoverflow
Solution 5 - LinqErnst GreinerView Answer on Stackoverflow
Solution 6 - LinqBaddackView Answer on Stackoverflow
Solution 7 - Linqduc14sView Answer on Stackoverflow
Solution 8 - LinqDmitry BychenkoView Answer on Stackoverflow