How to find the standard error of the mean?

RStatisticsStandard Error

R Problem Overview


Is there any command to find the standard error of the mean in R?

R Solutions


Solution 1 - R

The standard error is just the standard deviation divided by the square root of the sample size. So you can easily make your own function:

> std <- function(x) sd(x)/sqrt(length(x))
> std(c(1,2,3,4))
[1] 0.6454972

Solution 2 - R

The standard error (SE) is just the standard deviation of the sampling distribution. The variance of the sampling distribution is the variance of the data divided by N and the SE is the square root of that. Going from that understanding one can see that it is more efficient to use variance in the SE calculation. The sd function in R already does one square root (code for sd is in R and revealed by just typing "sd"). Therefore, the following is most efficient.

se <- function(x) sqrt(var(x)/length(x))

in order to make the function only a bit more complex and handle all of the options that you could pass to var, you could make this modification.

se <- function(x, ...) sqrt(var(x, ...)/length(x))

Using this syntax one can take advantage of things like how var deals with missing values. Anything that can be passed to var as a named argument can be used in this se call.

Solution 3 - R

A version of John's answer above that removes the pesky NA's:

stderr <- function(x, na.rm=FALSE) {
  if (na.rm) x <- na.omit(x)
  sqrt(var(x)/length(x))
}

Solution 4 - R

There's the plotrix package with has a built-in function for this: std.error

Solution 5 - R

The sciplot package has the built-in function se(x)

Solution 6 - R

As I'm going back to this question every now and then and because this question is old, I'm posting a benchmark for the most voted answers.

Note, that for @Ian's and @John's answers I created another version. Instead of using length(x), I used sum(!is.na(x)) (to avoid NAs). I used a vector of 10^6, with 1,000 repetitions.

library(microbenchmark)

set.seed(123)
myVec <- rnorm(10^6)

IanStd <- function(x) sd(x)/sqrt(length(x))

JohnSe <- function(x) sqrt(var(x)/length(x))

IanStdisNA <- function(x) sd(x)/sqrt(sum(!is.na(x)))

JohnSeisNA <- function(x) sqrt(var(x)/sum(!is.na(x)))

AranStderr <- function(x, na.rm=FALSE) {
  if (na.rm) x <- na.omit(x)
  sqrt(var(x)/length(x))
}

mbm <- microbenchmark(
  "plotrix" = {plotrix::std.error(myVec)},
  "IanStd" = {IanStd(myVec)},
  "JohnSe" = {JohnSe(myVec)},
  "IanStdisNA" = {IanStdisNA(myVec)},
  "JohnSeisNA" = {JohnSeisNA(myVec)},
  "AranStderr" = {AranStderr(myVec)}, 
  times = 1000)

mbm

Results:

Unit: milliseconds
       expr     min       lq      mean   median       uq      max neval cld
    plotrix 10.3033 10.89360 13.869947 11.36050 15.89165 125.8733  1000   c
     IanStd  4.3132  4.41730  4.618690  4.47425  4.63185   8.4388  1000 a  
     JohnSe  4.3324  4.41875  4.640725  4.48330  4.64935   9.4435  1000 a  
 IanStdisNA  8.4976  8.99980 11.278352  9.34315 12.62075 120.8937  1000  b 
 JohnSeisNA  8.5138  8.96600 11.127796  9.35725 12.63630 118.4796  1000  b 
 AranStderr  4.3324  4.41995  4.634949  4.47440  4.62620  14.3511  1000 a  


library(ggplot2)
autoplot(mbm)

enter image description here

Solution 7 - R

You can use the function stat.desc from pastec package.

library(pastec)
stat.desc(x, BASIC =TRUE, NORMAL =TRUE)

you can find more about it from here: https://www.rdocumentation.org/packages/pastecs/versions/1.3.21/topics/stat.desc

Solution 8 - R

Remembering that the mean can also by obtained using a linear model, regressing the variable against a single intercept, you can use also the lm(x~1) function for this!

Advantages are:

  • You obtain immediately confidence intervals with confint()
  • You can use tests for various hypothesis about the mean, using for example car::linear.hypothesis()
  • You can use more sophisticated estimates of the standard deviation, in case you have some heteroskedasticity, clustered-data, spatial-data etc, see package sandwich
## generate data
x <- rnorm(1000)

## estimate reg
reg <- lm(x~1)
coef(summary(reg))[,"Std. Error"]
#> [1] 0.03237811

## conpare with simple formula
all.equal(sd(x)/sqrt(length(x)),
          coef(summary(reg))[,"Std. Error"])
#> [1] TRUE

## extract confidence interval
confint(reg)
#>                   2.5 %    97.5 %
#> (Intercept) -0.06457031 0.0625035

Created on 2020-10-06 by the reprex package (v0.3.0)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionalexView Question on Stackoverflow
Solution 1 - RIan FellowsView Answer on Stackoverflow
Solution 2 - RJohnView Answer on Stackoverflow
Solution 3 - RarandomlypickednameView Answer on Stackoverflow
Solution 4 - RMatt BallView Answer on Stackoverflow
Solution 5 - Ruser2045554View Answer on Stackoverflow
Solution 6 - RDJVView Answer on Stackoverflow
Solution 7 - RAKSHAY KAPOORView Answer on Stackoverflow
Solution 8 - RMatifouView Answer on Stackoverflow