Test for equality among all elements of a single numeric vector

RVectorEquality

R Problem Overview


I'm trying to test whether all elements of a vector are equal to one another. The solutions I have come up with seem somewhat roundabout, both involving checking length().

x <- c(1, 2, 3, 4, 5, 6, 1)  # FALSE
y <- rep(2, times = 7)       # TRUE

With unique():

length(unique(x)) == 1
length(unique(y)) == 1

With rle():

length(rle(x)$values) == 1
length(rle(y)$values) == 1

A solution that would let me include a tolerance value for assessing 'equality' among elements would be ideal to avoid FAQ 7.31 issues.

Is there a built-in function for type of test that I have completely overlooked? identical() and all.equal() compare two R objects, so they won't work here.

Edit 1

Here are some benchmarking results. Using the code:

library(rbenchmark)

John <- function() all( abs(x - mean(x)) < .Machine$double.eps ^ 0.5 )
DWin <- function() {diff(range(x)) < .Machine$double.eps ^ 0.5}
zero_range <- function() {
  if (length(x) == 1) return(TRUE)
  x <- range(x) / mean(x)
  isTRUE(all.equal(x[1], x[2], tolerance = .Machine$double.eps ^ 0.5))
}

x <- runif(500000);

benchmark(John(), DWin(), zero_range(),
  columns=c("test", "replications", "elapsed", "relative"),
  order="relative", replications = 10000)

With the results:

          test replications elapsed relative
2       DWin()        10000 109.415 1.000000
3 zero_range()        10000 126.912 1.159914
1       John()        10000 208.463 1.905251

So it looks like diff(range(x)) < .Machine$double.eps ^ 0.5 is fastest.

R Solutions


Solution 1 - R

Why not simply using the variance:

var(x) == 0

If all the elements of x are equal, you will get a variance of 0. This works only for double and integers though.

Edit based on the comments below:
A more generic option would be to check for the length of unique elements in the vector which must be 1 in this case. This has the advantage that it works with all classes beyond just double and integer from which variance can be calculated from.

length(unique(x)) == 1

Solution 2 - R

If they're all numeric values then if tol is your tolerance then...

all( abs(y - mean(y)) < tol ) 

is the solution to your problem.

EDIT:

After looking at this, and other answers, and benchmarking a few things the following comes out over twice as fast as the DWin answer.

abs(max(x) - min(x)) < tol

This is a bit surprisingly faster than diff(range(x)) since diff shouldn't be much different than - and abs with two numbers. Requesting the range should optimize getting the minimum and maximum. Both diff and range are primitive functions. But the timing doesn't lie.

And, in addition, as @Waldi pointed out, abs is superfluous here.

Solution 3 - R

I use this method, which compares the min and the max, after dividing by the mean:

# Determine if range of vector is FP 0.
zero_range <- function(x, tol = .Machine$double.eps ^ 0.5) {
  if (length(x) == 1) return(TRUE)
  x <- range(x) / mean(x)
  isTRUE(all.equal(x[1], x[2], tolerance = tol))
}

If you were using this more seriously, you'd probably want to remove missing values before computing the range and mean.

Solution 4 - R

You can just check all(v==v[1])

Solution 5 - R

> isTRUE(all.equal( max(y) ,min(y)) )
[1] TRUE
> isTRUE(all.equal( max(x) ,min(x)) )
[1] FALSE

Another along the same lines:

> diff(range(x)) < .Machine$double.eps ^ 0.5
[1] FALSE
> diff(range(y)) < .Machine$double.eps ^ 0.5
[1] TRUE

Solution 6 - R

You can use identical() and all.equal() by comparing the first element to all others, effectively sweeping the comparison across:

R> compare <- function(v) all(sapply( as.list(v[-1]), 
+                         FUN=function(z) {identical(z, v[1])}))
R> compare(x)
[1] FALSE
R> compare(y)
[1] TRUE
R> 

That way you can add any epsilon to identical() as needed.

Solution 7 - R

Since I keep coming back to this question over and over, here's an Rcpp solution that will generally be much much faster than any of the R solutions if the answer is actually FALSE (because it will stop the moment it encounters a mismatch) and will have the same speed as the fastest R solution if the answer is TRUE. For example for the OP benchmark, system.time clocks in at exactly 0 using this function.

library(inline)
library(Rcpp)

fast_equal = cxxfunction(signature(x = 'numeric', y = 'numeric'), '
  NumericVector var(x);
  double precision = as<double>(y);

  for (int i = 0, size = var.size(); i < size; ++i) {
    if (var[i] - var[0] > precision || var[0] - var[i] > precision)
      return Rcpp::wrap(false);
  }

  return Rcpp::wrap(true);
', plugin = 'Rcpp')

fast_equal(c(1,2,3), 0.1)
#[1] FALSE
fast_equal(c(1,2,3), 2)
#[2] TRUE

Solution 8 - R

I wrote a function specifically for this, which can check not only elements in a vector, but also capable of checking if all elements in a list are identical. Of course it as well handle character vectors and all other types of vector well. It also has appropriate error handling.

all_identical <- function(x) {
  if (length(x) == 1L) {
    warning("'x' has a length of only 1")
    return(TRUE)
  } else if (length(x) == 0L) {
    warning("'x' has a length of 0")
    return(logical(0))
  } else {
    TF <- vapply(1:(length(x)-1),
                 function(n) identical(x[[n]], x[[n+1]]),
                 logical(1))
    if (all(TF)) TRUE else FALSE
  }
}

Now try some examples.

x <- c(1, 1, 1, NA, 1, 1, 1)
all_identical(x)       ## Return FALSE
all_identical(x[-4])   ## Return TRUE
y <- list(fac1 = factor(c("A", "B")),
          fac2 = factor(c("A", "B"), levels = c("B", "A"))
          )
all_identical(y)     ## Return FALSE as fac1 and fac2 have different level order

Solution 9 - R

You do not actually need to use min, mean, or max. Based on John's answer:

all(abs(x - x[[1]]) < tolerance)

Solution 10 - R

Here an alternative using the min, max trick but for a data frame. In the example I am comparing columns but the margin parameter from apply can be changed to 1 for rows.

valid = sum(!apply(your_dataframe, 2, function(x) diff(c(min(x), max(x)))) == 0)

If valid == 0 then all the elements are the same

Solution 11 - R

Another solution which uses the data.table package, compatible with strings and NA is uniqueN(x) == 1

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionkmmView Question on Stackoverflow
Solution 1 - RYohan ObadiaView Answer on Stackoverflow
Solution 2 - RJohnView Answer on Stackoverflow
Solution 3 - RhadleyView Answer on Stackoverflow
Solution 4 - RMaya LevyView Answer on Stackoverflow
Solution 5 - RIRTFMView Answer on Stackoverflow
Solution 6 - RDirk EddelbuettelView Answer on Stackoverflow
Solution 7 - ReddiView Answer on Stackoverflow
Solution 8 - RLawrence LeeView Answer on Stackoverflow
Solution 9 - Ruser2443147View Answer on Stackoverflow
Solution 10 - RpedrosaurioView Answer on Stackoverflow
Solution 11 - RDaniel VView Answer on Stackoverflow