Is it possible to use spread on multiple columns in tidyr similar to dcast?

RReshape2Tidyr

R Problem Overview


I have the following dummy data:

library(dplyr)
library(tidyr)
library(reshape2)
dt <- expand.grid(Year = 1990:2014, Product=LETTERS[1:8], Country = paste0(LETTERS, "I")) %>%   select(Product, Country, Year)
dt$value <- rnorm(nrow(dt))

I pick two product-country combinations

sdt <- dt %>% filter((Product == "A" & Country == "AI") | (Product == "B" & Country =="EI"))

and I want to see the values side by side for each combination. I can do this with dcast:

sdt %>% dcast(Year ~ Product + Country)

Is it possible to do this with spread from the package tidyr?

R Solutions


Solution 1 - R

One option would be to create a new 'Prod_Count' by joining the 'Product' and 'Country' columns by paste, remove those columns with the select and reshape from 'long' to 'wide' using spread from tidyr.

 library(dplyr)
 library(tidyr)
 sdt %>%
 mutate(Prod_Count=paste(Product, Country, sep="_")) %>%
 select(-Product, -Country)%>% 
 spread(Prod_Count, value)%>%
 head(2)
 #  Year      A_AI       B_EI
 #1 1990 0.7878674  0.2486044
 #2 1991 0.2343285 -1.1694878

Or we can avoid a couple of steps by using unite from tidyr (from @beetroot's comment) and reshape as before.

 sdt%>% 
 unite(Prod_Count, Product,Country) %>%
 spread(Prod_Count, value)%>% 
 head(2)
 #   Year      A_AI       B_EI
 # 1 1990 0.7878674  0.2486044
 # 2 1991 0.2343285 -1.1694878


Solution 2 - R

With the new function pivot_wider() introduced in tidyr version 1.0.0, this can be accomplished with one function call.

pivot_wider() (counterpart: pivot_longer()) works similar to spread(). However, it offers additional functionality such as using multiple key/name columns (and/or multiple value columns). To this end, the argument names_from—that indicates from which column(s) the names of the new variables are taken—may take more than one column name (here Product and Country).

library("tidyr")

sdt %>% 
    pivot_wider(id_cols = Year,
                names_from = c(Product, Country)) %>% 
    head(2)
#> # A tibble: 2 x 3
#>     Year   A_AI    B_EI
#>    <int>  <dbl>   <dbl>
#>  1  1990 -2.08  -0.113 
#>  2  1991 -1.02  -0.0546

See also: https://tidyr.tidyverse.org/articles/pivot.html

Solution 3 - R

Base R solution:

 # Concatenate grouping vector: 

dt$PC <- paste0(dt$Product, "_", dt$Country)

# Spread the vectors by year: 

dt2 <- reshape(dt[,c(!(names(dt) %in% c("Product", "Country")))],
               
               idvar = "Year",
               
               ids = unique(dt$Year),
               
               direction = "wide",
               
               timevar = "PC")

# Remove "value.", from the vector names:

names(dt2) <- gsub("value[.]", "", names(dt2))

Data:

dt <- expand.grid(Year = 1990:2014,
                  
                  Product = LETTERS[1:8],
                  
                  Country = paste0(LETTERS, "I"))

dt$value <- rnorm(nrow(dt))

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionmpiktasView Question on Stackoverflow
Solution 1 - RakrunView Answer on Stackoverflow
Solution 2 - RhplieningerView Answer on Stackoverflow
Solution 3 - Rhello_friendView Answer on Stackoverflow