How can I do standard deviation in Ruby?

RubyStandard Deviation

Ruby Problem Overview


I have several records with a given attribute, and I want to find the standard deviation.

How do I do that?

Ruby Solutions


Solution 1 - Ruby

module Enumerable

    def sum
      self.inject(0){|accum, i| accum + i }
    end

    def mean
      self.sum/self.length.to_f
    end

    def sample_variance
      m = self.mean
      sum = self.inject(0){|accum, i| accum +(i-m)**2 }
      sum/(self.length - 1).to_f
    end

    def standard_deviation
      Math.sqrt(self.sample_variance)
    end

end 

Testing it:

a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
a.standard_deviation  
# => 4.594682917363407

01/17/2012:

fixing "sample_variance" thanks to Dave Sag

Solution 2 - Ruby

It appears that Angela may have been wanting an existing library. After playing with statsample, array-statisics, and a few others, I'd recommend the descriptive_statistics gem if you're trying to avoid reinventing the wheel.

gem install descriptive_statistics

$ irb
1.9.2 :001 > require 'descriptive_statistics'
 => true 
1.9.2 :002 > samples = [1, 2, 2.2, 2.3, 4, 5]
 => [1, 2, 2.2, 2.3, 4, 5] 
1.9.2p290 :003 > samples.sum
 => 16.5 
1.9.2 :004 > samples.mean
 => 2.75 
1.9.2 :005 > samples.variance
 => 1.7924999999999998 
1.9.2 :006 > samples.standard_deviation
 => 1.3388427838995882 

I can't speak to its statistical correctness, or your comfort with monkey-patching Enumerable; but it's easy to use and easy to contribute to.

Solution 3 - Ruby

The answer given above is elegant but has a slight error in it. Not being a stats head myself I sat up and read in detail a number of websites and found this one gave the most comprehensible explanation of how to derive a standard deviation. http://sonia.hubpages.com/hub/stddev

The error in the answer above is in the sample_variance method.

Here is my corrected version, along with a simple unit test that shows it works.

in ./lib/enumerable/standard_deviation.rb

#!usr/bin/ruby

module Enumerable

  def sum
    return self.inject(0){|accum, i| accum + i }
  end

  def mean
    return self.sum / self.length.to_f
  end

  def sample_variance
    m = self.mean
    sum = self.inject(0){|accum, i| accum + (i - m) ** 2 }
    return sum / (self.length - 1).to_f
  end

  def standard_deviation
    return Math.sqrt(self.sample_variance)
  end

end

in ./test using numbers derived from a simple spreadsheet.

Screen Snapshot of a Numbers spreadsheet with example data

#!usr/bin/ruby

require 'enumerable/standard_deviation'

class StandardDeviationTest < Test::Unit::TestCase

  THE_NUMBERS = [1, 2, 2.2, 2.3, 4, 5]

  def test_sum
    expected = 16.5
    result = THE_NUMBERS.sum
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_mean
    expected = 2.75
    result = THE_NUMBERS.mean
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_sample_variance
    expected = 2.151
    result = THE_NUMBERS.sample_variance
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_standard_deviation
    expected = 1.4666287874
    result = THE_NUMBERS.standard_deviation
    assert result.round(10) == expected, "expected #{expected} but got #{result}"
  end

end

Solution 4 - Ruby

I'm not a big fan of adding methods to Enumerable since there could be unwanted side effects. It also gives methods really specific to an array of numbers to any class inheriting from Enumerable, which doesn't make sense in most cases.

While this is fine for tests, scripts or small apps, it's risky for larger applications, so here's an alternative based on @tolitius' answer which was already perfect. This is more for reference than anything else:

module MyApp::Maths
  def self.sum(a)
    a.inject(0){ |accum, i| accum + i }
  end

  def self.mean(a)
    sum(a) / a.length.to_f
  end

  def self.sample_variance(a)
    m = mean(a)
    sum = a.inject(0){ |accum, i| accum + (i - m) ** 2 }
    sum / (a.length - 1).to_f
  end

  def self.standard_deviation(a)
    Math.sqrt(sample_variance(a))
  end
end

And then you use it as such:

2.0.0p353 > MyApp::Maths.standard_deviation([1,2,3,4,5])
=> 1.5811388300841898

2.0.0p353 :007 > a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
 => [20, 23, 23, 24, 25, 22, 12, 21, 29]

2.0.0p353 :008 > MyApp::Maths.standard_deviation(a)
 => 4.594682917363407

2.0.0p353 :043 > MyApp::Maths.standard_deviation([1,2,2.2,2.3,4,5])
 => 1.466628787389638

The behavior is the same, but it avoids the overheads and risks of adding methods to Enumerable.

Solution 5 - Ruby

The presented computation are not very efficient because they require several (at least two, but often three because you usually want to present average in addition to std-dev) passes through the array.

I know Ruby is not the place to look for efficiency, but here is my implementation that computes average and standard deviation with a single pass over the list values:

module Enumerable
  
  def avg_stddev
    return nil unless count > 0
    return [ first, 0 ] if count == 1
    sx = sx2 = 0
    each do |x|
      sx2 += x**2
      sx += x
    end
    [ 
      sx.to_f  / count,
      Math.sqrt( # http://wijmo.com/docs/spreadjs/STDEV.html
        (sx2 - sx**2.0/count)
        / 
        (count - 1)
      )
    ]
  end
  
end

Solution 6 - Ruby

As a simple function, given a list of numbers:

def standard_deviation(list)
  mean = list.inject(:+) / list.length.to_f
  var_sum = list.map{|n| (n-mean)**2}.inject(:+).to_f
  sample_variance = var_sum / (list.length - 1)
  Math.sqrt(sample_variance)
end

Solution 7 - Ruby

If the records at hand are of type Integer or Rational, you may want to compute the variance using Rational instead of Float to avoid errors introduced by rounding.

For example:

def variance(list)
  mean = list.reduce(:+)/list.length.to_r
  sum_of_squared_differences = list.map { |i| (i - mean)**2 }.reduce(:+)
  sum_of_squared_differences/list.length
end

(It would be prudent to add special-case handling for empty lists and other edge cases.)

Then the square root can be defined as:

def std_dev(list)
  Math.sqrt(variance(list))
end

Solution 8 - Ruby

In case people are using postgres ... it provides aggregate functions for stddev_pop and stddev_samp - postgresql aggregate functions

stddev (equiv of stddev_samp) available since at least postgres 7.1, since 8.2 both samp and pop are provided.

Solution 9 - Ruby

Or how about:

class Stats
    def initialize( a )
	    @avg = a.count > 0 ? a.sum / a.count.to_f : 0.0
	    @stdev = a.count > 0 ? ( a.reduce(0){ |sum, v| sum + (@avg - v) ** 2 } / a.count ) ** 0.5 : 0.0
    end
end

Solution 10 - Ruby

You can place this as helper method and assess it everywhere.

def calc_standard_deviation(arr)
    mean = arr.sum(0.0) / arr.size
    sum = arr.sum(0.0) { |element| (element - mean) ** 2 }
    variance = sum / (arr.size - 1)
    standard_deviation = Math.sqrt(variance)
end

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSatchelView Question on Stackoverflow
Solution 1 - RubytolitiusView Answer on Stackoverflow
Solution 2 - RubyeprothroView Answer on Stackoverflow
Solution 3 - RubyDave SagView Answer on Stackoverflow
Solution 4 - RubymarcggView Answer on Stackoverflow
Solution 5 - RubyGussView Answer on Stackoverflow
Solution 6 - RubytothemarioView Answer on Stackoverflow
Solution 7 - RubyPeter KageyView Answer on Stackoverflow
Solution 8 - RubyStraffView Answer on Stackoverflow
Solution 9 - RubyMads Boyd-MadsenView Answer on Stackoverflow
Solution 10 - RubyRoshanView Answer on Stackoverflow