Reshape three column data frame to matrix ("long" to "wide" format)

R Problem Overview

I have a data.frame that looks like this.

x a 1 
x b 2 
x c 3 
y a 3 
y b 3 
y c 2

I want this in matrix form so I can feed it to heatmap to make a plot. The result should look something like:

    a    b    c
x   1    2    3
y   3    3    2

I have tried cast from the reshape package and I have tried writing a manual function to do this but I do not seem to be able to get it right.

R Solutions

Solution 1 - R

There are many ways to do this. This answer starts with what is quickly becoming the standard method, but also includes older methods and various other methods from answers to similar questions scattered around this site.

tmp <- data.frame(x=gl(2,3, labels=letters[24:25]),
                  y=gl(3,1,6, labels=letters[1:3]), 
                  z=c(1,2,3,3,3,2))

Using the tidyverse:

The new cool new way to do this is with pivot_wider from tidyr 1.0.0. It returns a data frame, which is probably what most readers of this answer will want. For a heatmap, though, you would need to convert this to a true matrix.

library(tidyr)
pivot_wider(tmp, names_from = y, values_from = z)
## # A tibble: 2 x 4
## x         a     b     c
## <fct> <dbl> <dbl> <dbl>
## 1 x       1     2     3
## 2 y       3     3     2

The old cool new way to do this is with spread from tidyr. It similarly returns a data frame.

library(tidyr)
spread(tmp, y, z)
##   x a b c
## 1 x 1 2 3
## 2 y 3 3 2

Using reshape2:

One of the first steps toward the tidyverse was the reshape2 package.

To get a matrix use acast:

library(reshape2)
acast(tmp, x~y, value.var="z")
##   a b c
## x 1 2 3
## y 3 3 2

Or to get a data frame, use dcast, as here: https://stackoverflow.com/q/8093839/210673.

dcast(tmp, x~y, value.var="z")
##   x a b c
## 1 x 1 2 3
## 2 y 3 3 2

Using plyr:

In between reshape2 and the tidyverse came plyr, with the daply function, as shown here: https://stackoverflow.com/a/7020101/210673

library(plyr)
daply(tmp, .(x, y), function(x) x$z)
##    y
## x   a b c
##   x 1 2 3
##   y 3 3 2

Using matrix indexing:

This is kinda old school but is a nice demonstration of matrix indexing, which can be really useful in certain situations.

with(tmp, {
  out <- matrix(nrow=nlevels(x), ncol=nlevels(y),
                dimnames=list(levels(x), levels(y)))
  out[cbind(x, y)] <- z
  out
})

Using xtabs:

xtabs(z~x+y, data=tmp)

Using a sparse matrix:

There's also sparseMatrix within the Matrix package, as seen here: https://stackoverflow.com/q/9833052/210673

with(tmp, sparseMatrix(i = as.numeric(x), j=as.numeric(y), x=z,
                       dimnames=list(levels(x), levels(y))))
## 2 x 3 sparse Matrix of class "dgCMatrix"
##   a b c
## x 1 2 3
## y 3 3 2

Using reshape:

You can also use the base R function reshape, as suggested here: https://stackoverflow.com/q/2126108/210673, though you have to do a little manipulation afterwards to remove an extra columns and get the names right (not shown).

reshape(tmp, idvar="x", timevar="y", direction="wide")
##   x z.a z.b z.c
## 1 x   1   2   3
## 4 y   3   3   2

Solution 2 - R

The question is some years old but maybe some people are still interested in alternative answers.

If you don't want to load any packages, you might use this function:

#' Converts three columns of a data.frame into a matrix -- e.g. to plot 
#' the data via image() later on. Two of the columns form the row and
#' col dimensions of the matrix. The third column provides values for
#' the matrix.
#' 
#' @param data data.frame: input data
#' @param rowtitle string: row-dimension; name of the column in data, which distinct values should be used as row names in the output matrix
#' @param coltitle string: col-dimension; name of the column in data, which distinct values should be used as column names in the output matrix
#' @param datatitle string: name of the column in data, which values should be filled into the output matrix
#' @param rowdecreasing logical: should the row names be in ascending (FALSE) or in descending (TRUE) order?
#' @param coldecreasing logical: should the col names be in ascending (FALSE) or in descending (TRUE) order?
#' @param default_value numeric: default value of matrix entries if no value exists in data.frame for the entries
#' @return matrix: matrix containing values of data[[datatitle]] with rownames data[[rowtitle]] and colnames data[coltitle]
#' @author Daniel Neumann
#' @date 2017-08-29
data.frame2matrix = function(data, rowtitle, coltitle, datatitle, 
                             rowdecreasing = FALSE, coldecreasing = FALSE,
                             default_value = NA) {
  
  # check, whether titles exist as columns names in the data.frame data
  if ( (!(rowtitle%in%names(data))) 
       || (!(coltitle%in%names(data))) 
       || (!(datatitle%in%names(data))) ) {
    stop('data.frame2matrix: bad row-, col-, or datatitle.')
  }
  
  # get number of rows in data
  ndata = dim(data)[1]
  
  # extract rownames and colnames for the matrix from the data.frame
  rownames = sort(unique(data[[rowtitle]]), decreasing = rowdecreasing)
  nrows = length(rownames)
  colnames = sort(unique(data[[coltitle]]), decreasing = coldecreasing)
  ncols = length(colnames)
  
  # initialize the matrix
  out_matrix = matrix(NA, 
                      nrow = nrows, ncol = ncols,
                      dimnames=list(rownames, colnames))

  # iterate rows of data
  for (i1 in 1:ndata) {
    # get matrix-row and matrix-column indices for the current data-row
    iR = which(rownames==data[[rowtitle]][i1])
    iC = which(colnames==data[[coltitle]][i1])
    
    # throw an error if the matrix entry (iR,iC) is already filled.
    if (!is.na(out_matrix[iR, iC])) stop('data.frame2matrix: double entry in data.frame')
    out_matrix[iR, iC] = data[[datatitle]][i1]
  }

  # set empty matrix entries to the default value
  out_matrix[is.na(out_matrix)] = default_value

  # return matrix
  return(out_matrix)

}

How it works:

myData = as.data.frame(list('dim1'=c('x', 'x', 'x', 'y','y','y'),
                            'dim2'=c('a','b','c','a','b','c'),
                            'values'=c(1,2,3,3,3,2))) 

myMatrix = data.frame2matrix(myData, 'dim1', 'dim2', 'values')

myMatrix
>   a b c
> x 1 2 3
> y 3 3 2

Solution 3 - R

base R, `unstack`

unstack(df, V3 ~ V2)
#   a b c
# 1 1 2 3
# 2 3 3 2

This may not be a general solution but works well in this case.

data

df<-structure(list(V1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("x", 
"y"), class = "factor"), V2 = structure(c(1L, 2L, 3L, 1L, 2L, 
3L), .Label = c("a", "b", "c"), class = "factor"), V3 = c(1L, 
2L, 3L, 3L, 3L, 2L)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, 
-6L))

Solution 4 - R

For sake of completeness, there's a tapply() solution around.

with(d, tapply(z, list(x, y), sum))
#   a b c
# x 1 2 3
# y 3 3 2

Data

d <- structure(list(x = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("x", 
"y"), class = "factor"), y = structure(c(1L, 2L, 3L, 1L, 2L, 
3L), .Label = c("a", "b", "c"), class = "factor"), z = c(1, 2, 
3, 3, 3, 2)), class = "data.frame", row.names = c(NA, -6L))

Solution 5 - R

From tidyr 0.8.3.9000, a new function called pivot_wider() is introduced. It is basically an upgraded version of the previous spread() function (which is, moreover, no longer under active development). From pivoting vignette:

> This vignette describes the use of the new pivot_longer() and > pivot_wider() functions. Their goal is to improve the usability of > gather() and spread(), and incorporate state-of-the-art features found > in other packages. > > For some time, it’s been obvious that there is something fundamentally > wrong with the design of spread() and gather(). Many people don’t find > the names intuitive and find it hard to remember which direction > corresponds to spreading and which to gathering. It also seems > surprisingly hard to remember the arguments to these functions, > meaning that many people (including me!) have to consult the > documentation every time.

How to use it (using the data from @Aaron):

pivot_wider(data = tmp, names_from = y, values_from = z)

  x         a     b     c
  <fct> <dbl> <dbl> <dbl>
1 x         1     2     3
2 y         3     3     2

Or in a "full" tidyverse fashion:

tmp %>% 
 pivot_wider(names_from = y, values_from = z)

Solution 6 - R

The tidyr package from the tidyverse has an excellent function that does this.

Assuming your variables are named v1, v2 and v3, left to right, and you data frame is named dat:

dat %>% 
spread(key = v2,
       value = v3)

Ta da!

Content Type	Original Author	Original Content on Stackoverflow
Question	MalteseUnderdog	View Question on Stackoverflow
Solution 1 - R	Aaron left Stack Overflow	View Answer on Stackoverflow
Solution 2 - R	daniel.heydebreck	View Answer on Stackoverflow
Solution 3 - R	lebatsnok	View Answer on Stackoverflow
Solution 4 - R	jay.sf	View Answer on Stackoverflow
Solution 5 - R	tmfmnk	View Answer on Stackoverflow
Solution 6 - R	Ahsen Majid	View Answer on Stackoverflow

Reshape three column data frame to matrix ("long" to "wide" format)

R Problem Overview

R Solutions

Solution 1 - R

Solution 2 - R

Solution 3 - R

base R, `unstack`

data

Solution 4 - R

Solution 5 - R

Solution 6 - R

SVN command to delete all locally missing files

How do I get rid of this unwanted bar from Eclipse?

Attributions

R Problem Overview

R Solutions

Solution 1 - R

Solution 2 - R

Solution 3 - R

base R, unstack

data

Solution 4 - R

Solution 5 - R

Solution 6 - R

SVN command to delete all locally missing files

How do I get rid of this unwanted bar from Eclipse?

Attributions

base R, `unstack`