Unique combination of all elements from two (or more) vectors
RR FaqR Problem Overview
I am trying to create a unique combination of all elements from two vectors of different size in R.
For example, the first vector is
a <- c("ABC", "DEF", "GHI")
and the second one is dates stored as strings currently
b <- c("2012-05-01", "2012-05-02", "2012-05-03", "2012-05-04", "2012-05-05")
I need to create a data frame with two columns like this
> data
a b
1 ABC 2012-05-01
2 ABC 2012-05-02
3 ABC 2012-05-03
4 ABC 2012-05-04
5 ABC 2012-05-05
6 DEF 2012-05-01
7 DEF 2012-05-02
8 DEF 2012-05-03
9 DEF 2012-05-04
10 DEF 2012-05-05
11 GHI 2012-05-01
12 GHI 2012-05-02
13 GHI 2012-05-03
14 GHI 2012-05-04
15 GHI 2012-05-05
So basically, I am looking for a unique combination by considering all the elements of one vector (a) juxtaposed with all the elements of the second vector (b).
An ideal solution would generalize to more input vectors.
> See also:
> How to generate a matrix of combinations
R Solutions
Solution 1 - R
this maybe what you are after
> expand.grid(a,b)
Var1 Var2
1 ABC 2012-05-01
2 DEF 2012-05-01
3 GHI 2012-05-01
4 ABC 2012-05-02
5 DEF 2012-05-02
6 GHI 2012-05-02
7 ABC 2012-05-03
8 DEF 2012-05-03
9 GHI 2012-05-03
10 ABC 2012-05-04
11 DEF 2012-05-04
12 GHI 2012-05-04
13 ABC 2012-05-05
14 DEF 2012-05-05
15 GHI 2012-05-05
If the resulting order isn't what you want, you can sort afterwards. If you name the arguments to expand.grid
, they will become column names:
df = expand.grid(a = a, b = b)
df[order(df$a), ]
And expand.grid
generalizes to any number of input columns.
Solution 2 - R
The tidyr
package provides the nice alternative crossing
, which works better than the classic expand.grid
function because (1) strings are not converted into factors and (2) the sorting is more intuitive:
library(tidyr)
a <- c("ABC", "DEF", "GHI")
b <- c("2012-05-01", "2012-05-02", "2012-05-03", "2012-05-04", "2012-05-05")
crossing(a, b)
# A tibble: 15 x 2
a b
<chr> <chr>
1 ABC 2012-05-01
2 ABC 2012-05-02
3 ABC 2012-05-03
4 ABC 2012-05-04
5 ABC 2012-05-05
6 DEF 2012-05-01
7 DEF 2012-05-02
8 DEF 2012-05-03
9 DEF 2012-05-04
10 DEF 2012-05-05
11 GHI 2012-05-01
12 GHI 2012-05-02
13 GHI 2012-05-03
14 GHI 2012-05-04
15 GHI 2012-05-05
Solution 3 - R
Missing in this [tag:r-faq] overview is the CJ
-function from the [tag:data.table]-package. Using:
library(data.table)
CJ(a, b, unique = TRUE)
gives:
> a b > 1: ABC 2012-05-01 > 2: ABC 2012-05-02 > 3: ABC 2012-05-03 > 4: ABC 2012-05-04 > 5: ABC 2012-05-05 > 6: DEF 2012-05-01 > 7: DEF 2012-05-02 > 8: DEF 2012-05-03 > 9: DEF 2012-05-04 > 10: DEF 2012-05-05 > 11: GHI 2012-05-01 > 12: GHI 2012-05-02 > 13: GHI 2012-05-03 > 14: GHI 2012-05-04 > 15: GHI 2012-05-05
NOTE: since version 1.12.2 CJ
autonames the resulting columns (see also here and here).
Solution 4 - R
Since version 1.0.0, tidyr
offers its own version of expand.grid()
. It completes the existing family of expand()
, nesting()
, and crossing()
with a low-level function that works with vectors.
When compared to base::expand.grid()
:
> Varies the first element fastest. Never converts strings to factors. > Does not add any additional attributes. Returns a tibble, not a data > frame. Can expand any generalised vector, including data frames.
a <- c("ABC", "DEF", "GHI")
b <- c("2012-05-01", "2012-05-02", "2012-05-03", "2012-05-04", "2012-05-05")
tidyr::expand_grid(a, b)
a b
<chr> <chr>
1 ABC 2012-05-01
2 ABC 2012-05-02
3 ABC 2012-05-03
4 ABC 2012-05-04
5 ABC 2012-05-05
6 DEF 2012-05-01
7 DEF 2012-05-02
8 DEF 2012-05-03
9 DEF 2012-05-04
10 DEF 2012-05-05
11 GHI 2012-05-01
12 GHI 2012-05-02
13 GHI 2012-05-03
14 GHI 2012-05-04
15 GHI 2012-05-05
Solution 5 - R
you can use order function for sorting any number of columns. for your example
df <- expand.grid(a,b)
> df
Var1 Var2
1 ABC 2012-05-01
2 DEF 2012-05-01
3 GHI 2012-05-01
4 ABC 2012-05-02
5 DEF 2012-05-02
6 GHI 2012-05-02
7 ABC 2012-05-03
8 DEF 2012-05-03
9 GHI 2012-05-03
10 ABC 2012-05-04
11 DEF 2012-05-04
12 GHI 2012-05-04
13 ABC 2012-05-05
14 DEF 2012-05-05
15 GHI 2012-05-05
> df[order( df[,1], df[,2] ),]
Var1 Var2
1 ABC 2012-05-01
4 ABC 2012-05-02
7 ABC 2012-05-03
10 ABC 2012-05-04
13 ABC 2012-05-05
2 DEF 2012-05-01
5 DEF 2012-05-02
8 DEF 2012-05-03
11 DEF 2012-05-04
14 DEF 2012-05-05
3 GHI 2012-05-01
6 GHI 2012-05-02
9 GHI 2012-05-03
12 GHI 2012-05-04
15 GHI 2012-05-05`