Reasons for using the set.seed function

RRandom

R Problem Overview


Many times I have seen the set.seed function in R, before starting the program. I know it's basically used for the random number generation. Is there any specific need to set this?

R Solutions


Solution 1 - R

The need is the possible desire for reproducible results, which may for example come from trying to debug your program, or of course from trying to redo what it does:

These two results we will "never" reproduce as I just asked for something "random":

R> sample(LETTERS, 5)
[1] "K" "N" "R" "Z" "G"
R> sample(LETTERS, 5)
[1] "L" "P" "J" "E" "D"

These two, however, are identical because I set the seed:

R> set.seed(42); sample(LETTERS, 5)
[1] "X" "Z" "G" "T" "O"
R> set.seed(42); sample(LETTERS, 5)
[1] "X" "Z" "G" "T" "O"
R> 

There is vast literature on all that; Wikipedia is a good start. In essence, these RNGs are called Pseudo Random Number Generators because they are in fact fully algorithmic: given the same seed, you get the same sequence. And that is a feature and not a bug.

Solution 2 - R

You have to set seed every time you want to get a reproducible random result.

set.seed(1)
rnorm(4)
set.seed(1)
rnorm(4)

Solution 3 - R

Just adding some addition aspects. Need for setting seed: In the academic world, if one claims that his algorithm achieves, say 98.05% performance in one simulation, others need to be able to reproduce it.

?set.seed

Going through the help file of this function, these are some interesting facts:

> (1) set.seed() returns NULL, invisible > > (2) "Initially, there is no seed; a new one is created from the current time and the process ID when one is required. Hence different sessions will give different simulation results, by default. However, the seed might be restored from a previous session if a previously saved workspace is restored.", this is why you would want to call set.seed() with same integer values the next time you want a same sequence of random sequence.

Solution 4 - R

Fixing the seed is essential when we try to optimize a function that involves randomly generated numbers (e.g. in simulation based estimation). Loosely speaking, if we do not fix the seed, the variation due to drawing different random numbers will likely cause the optimization algorithm to fail.

Suppose that, for some reason, you want to estimate the standard deviation (sd) of a mean-zero normal distribution by simulation, given a sample. This can be achieved by running a numerical optimization around steps

  1. (Setting the seed)
  2. Given a value for sd, generate normally distributed data
  3. Evaluate the likelihood of your data given the simulated distributions

The following functions do this, once without step 1., once including it:

# without fixing the seed
simllh <- function(sd, y, Ns){
  simdist <- density(rnorm(Ns, mean = 0, sd = sd))
  llh <- sapply(y, function(x){ simdist$y[which.min((x - simdist$x)^2)] })
  return(-sum(log(llh)))
}
# same function with fixed seed
simllh.fix.seed <- function(sd,y,Ns){
  set.seed(48)
  simdist <- density(rnorm(Ns,mean=0,sd=sd))
  llh <- sapply(y,function(x){simdist$y[which.min((x-simdist$x)^2)]})
  return(-sum(log(llh)))
}

We can check the relative performance of the two functions in discovering the true parameter value with a short Monte Carlo study:

N <- 20; sd <- 2 # features of simulated data
est1 <- rep(NA,1000); est2 <- rep(NA,1000) # initialize the estimate stores
for (i in 1:1000) {
  as.numeric(Sys.time())-> t; set.seed((t - floor(t)) * 1e8 -> seed) # set the seed to random seed
  y <- rnorm(N, sd = sd) # generate the data
  est1[i] <- optim(1, simllh, y = y, Ns = 1000, lower = 0.01)$par
  est2[i] <- optim(1, simllh.fix.seed, y = y, Ns = 1000, lower = 0.01)$par
}
hist(est1)
hist(est2)

The resulting distributions of the parameter estimates are:

Histogram of parameter estimates without fixing the seed Histogram of parameter estimates fixing the seed

When we fix the seed, the numerical search ends up close to the true parameter value of 2 far more often.

Solution 5 - R

basically set.seed() function will help to reuse the same set of random variables , which we may need in future to again evaluate particular task again with same random varibales

we just need to declare it before using any random numbers generating function.

Solution 6 - R

set.seed is a base function that it is able to generate (every time you want) together other functions (rnorm, runif, sample) the same random value.

Below an example without set.seed

> set.seed(NULL)
> rnorm(5)
[1]  1.5982677 -2.2572974  2.3057461  0.5935456  0.1143519
> rnorm(5)
[1]  0.15135371  0.20266228  0.95084266  0.09319339 -1.11049182
> set.seed(NULL)
> runif(5)
[1] 0.05697712 0.31892399 0.92547023 0.88360393 0.90015169
> runif(5)
[1] 0.09374559 0.64406494 0.65817582 0.30179009 0.19760375
> set.seed(NULL)
> sample(5)
[1] 5 4 3 1 2
> sample(5)
[1] 2 1 5 4 3

Below an example with set.seed

> set.seed(123)
> rnorm(5)
[1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774
> set.seed(123)
> rnorm(5)
[1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774
> set.seed(123)
> runif(5)
[1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673
> set.seed(123)
> runif(5)
[1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673
> set.seed(123)
> sample(5)
[1] 3 2 5 4 1
> set.seed(123)
> sample(5)
[1] 3 2 5 4 1

Solution 7 - R

Just to add further... You need to set the seed every time you do some random stuff if you want consistency. The seed doesn't remain set.

set.seed(0)
rnorm(3)
set.seed(0)
rnorm(3)

[1]  1.2629543 -0.3262334  1.3297993
[1]  1.2629543 -0.3262334  1.3297993
set.seed(0)
rnorm(3)
rnorm(3)

[1]  1.2629543 -0.3262334  1.3297993
[1]  1.2724293  0.4146414 -1.5399500

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionVigneshView Question on Stackoverflow
Solution 1 - RDirk EddelbuettelView Answer on Stackoverflow
Solution 2 - RChia-hungView Answer on Stackoverflow
Solution 3 - RRidingstarView Answer on Stackoverflow
Solution 4 - RMatthias SchmidtblaicherView Answer on Stackoverflow
Solution 5 - Ruser4388407View Answer on Stackoverflow
Solution 6 - REarl MascettiView Answer on Stackoverflow
Solution 7 - RHarleyView Answer on Stackoverflow