How to add a row to a data frame in R?
RDataframeR Problem Overview
In R, how do you add a new row to a data frame once the data frame has already been initialized?
So far I have this:
df <- data.frame("hi", "bye")
names(df) <- c("hello", "goodbye")
#I am trying to add "hola" and "ciao" as a new row
de <- data.frame("hola", "ciao")
merge(df, de) # Adds to the same row as new columns
# Unfortunately, I couldn't find an rbind() solution that wouldn't give me an error
Any help would be appreciated
R Solutions
Solution 1 - R
Let's make it simple:
df[nrow(df) + 1,] = c("v1","v2")
Solution 2 - R
Like @Khashaa and @Richard Scriven point out in comments, you have to set consistent column names for all the data frames you want to append.
Hence, you need to explicitly declare the columns names for the second data frame, de
, then use rbind()
. You only set column names for the first data frame, df
:
df<-data.frame("hi","bye")
names(df)<-c("hello","goodbye")
de<-data.frame("hola","ciao")
names(de)<-c("hello","goodbye")
newdf <- rbind(df, de)
Solution 3 - R
There's now add_row()
from the tibble
or tidyverse
packages.
library(tidyverse)
df %>% add_row(hello = "hola", goodbye = "ciao")
Unspecified columns get an NA
.
Solution 4 - R
Or, as inspired by @MatheusAraujo:
df[nrow(df) + 1,] = list("v1","v2")
This would allow for mixed data types.
Solution 5 - R
I like list
instead of c
because it handles mixed data types better. Adding an additional column to the original poster's question:
#Create an empty data frame
df <- data.frame(hello=character(), goodbye=character(), volume=double())
de <- list(hello="hi", goodbye="bye", volume=3.0)
df = rbind(df,de, stringsAsFactors=FALSE)
de <- list(hello="hola", goodbye="ciao", volume=13.1)
df = rbind(df,de, stringsAsFactors=FALSE)
Note that some additional control is required if the string/factor conversion is important.
Or using the original variables with the solution from MatheusAraujo/Ytsen de Boer:
df[nrow(df) + 1,] = list(hello="hallo",goodbye="auf wiedersehen", volume=20.2)
Note that this solution doesn't work well with the strings unless there is existing data in the dataframe.
Solution 6 - R
Not terribly elegant, but:
data.frame(rbind(as.matrix(df), as.matrix(de)))
From documentation of the rbind
function:
> For rbind
column names are taken from the first argument with appropriate names: colnames for a matrix...
Solution 7 - R
If you want to make an empty data frame and add contents in a loop, the following may help:
# Number of students in class
student.count <- 36
# Gather data about the students
student.age <- sample(14:17, size = student.count, replace = TRUE)
student.gender <- sample(c('male', 'female'), size = student.count, replace = TRUE)
student.marks <- sample(46:97, size = student.count, replace = TRUE)
# Create empty data frame
student.data <- data.frame()
# Populate the data frame using a for loop
for (i in 1 : student.count) {
# Get the row data
age <- student.age[i]
gender <- student.gender[i]
marks <- student.marks[i]
# Populate the row
new.row <- data.frame(age = age, gender = gender, marks = marks)
# Add the row
student.data <- rbind(student.data, new.row)
}
# Print the data frame
student.data
Hope it helps :)
Solution 8 - R
To build a data.frame in a loop:
df <- data.frame()
for(i in 1:10){
df <- rbind(df, data.frame(str="hello", x=i, y=i*10))
}
Solution 9 - R
I need to add stringsAsFactors=FALSE
when creating the dataframe.
> df <- data.frame("hello"= character(0), "goodbye"=character(0))
> df
[1] hello goodbye
<0 rows> (or 0-length row.names)
> df[nrow(df) + 1,] = list("hi","bye")
Warning messages:
1: In `[<-.factor`(`*tmp*`, iseq, value = "hi") :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, iseq, value = "bye") :
invalid factor level, NA generated
> df
hello goodbye
1 <NA> <NA>
>
.
> df <- data.frame("hello"= character(0), "goodbye"=character(0), stringsAsFactors=FALSE)
> df
[1] hello goodbye
<0 rows> (or 0-length row.names)
> df[nrow(df) + 1,] = list("hi","bye")
> df[nrow(df) + 1,] = list("hola","ciao")
> df[nrow(df) + 1,] = list(hello="hallo",goodbye="auf wiedersehen")
> df
hello goodbye
1 hi bye
2 hola ciao
3 hallo auf wiedersehen
>
Solution 10 - R
Make certain to specify
stringsAsFactors=FALSE
when creating the dataframe:
> rm(list=ls())
> trigonometry <- data.frame(character(0), numeric(0), stringsAsFactors=FALSE)
> colnames(trigonometry) <- c("theta", "sin.theta")
> trigonometry
[1] theta sin.theta
<0 rows> (or 0-length row.names)
> trigonometry[nrow(trigonometry) + 1, ] <- c("0", sin(0))
> trigonometry[nrow(trigonometry) + 1, ] <- c("pi/2", sin(pi/2))
> trigonometry
theta sin.theta
1 0 0
2 pi/2 1
> typeof(trigonometry)
[1] "list"
> class(trigonometry)
[1] "data.frame"
Failing to use stringsAsFactors=FALSE
when creating the dataframe will
result in the following error when attempting to add the new row:
> trigonometry[nrow(trigonometry) + 1, ] <- c("0", sin(0))
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "0") :
invalid factor level, NA generated
Solution 11 - R
There is a simpler way to append a record from one dataframe to another IF you know that the two dataframes share the same columns and types. To append one row from xx
to yy
just do the following where i
is the i
'th row in xx
.
yy[nrow(yy)+1,] <- xx[i,]
Simple as that. No messy binds. If you need to append all of xx
to yy
, then either call a loop or take advantage of R's sequence abilities and do this:
zz[(nrow(zz)+1):(nrow(zz)+nrow(yy)),] <- yy[1:nrow(yy),]
Solution 12 - R
To formalize what someone else used setNames for:
add_row <- function(original_data, new_vals_list){
# appends row to dataset while assuming new vals are ordered and classed appropriately.
# new_vals must be a list not a single vector.
rbind(
original_data,
setNames(data.frame(new_vals_list), colnames(original_data))
)
}
It preserves class when legal and passes errors elsewhere.
m <- mtcars[ ,1:3]
m$cyl <- as.factor(m$cyl)
str(m)
#'data.frame': 32 obs. of 3 variables:
# $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: num 160 160 108 258 360 ...
Factor preserved when adding 4, even though it was passed as a numeric.
str(add_row(m, list(20,4,160)))
#'data.frame': 33 obs. of 3 variables:
# $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: num 160 160 108 258 360 ...
Attempting to pass a non- 4,6,8 would return an error that factor level is invalid.
str(add_row(m, list(20,3,160)))
# 'data.frame': 33 obs. of 3 variables:
# $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: num 160 160 108 258 360 ...
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 3) :
invalid factor level, NA generated
Solution 13 - R
I will add to the other suggestions. I use the base r code to create a dataframe:
data_set_name <- data.frame(data_set)
Now I always suggest making a duplicate of the original data frame just in case you need to go back or test something out. I listed that below:
data_set_name_copy <- data_set_name
Now if you wanted to add a new column the code would look like the following:
data_set_name_copy$Name_of_New_Column <- Data_for_New_Column
The $
signifies that you are adding a new column and right after as outlined you insert the nomenclature/name for your new entry.
Solution 14 - R
I think,
rbind.data.frame(df, de)
should do the trick
Solution 15 - R
In dplyr >= 1.0.0 you could use row_insert
:
df1 <- data.frame(hello = "hi", goodbye = "bye")
df2 <- data.frame(hello = "hola", goodbye = "ciao")
library(dplyr)
df1 %>%
rows_insert(df2)
Matching, by = "hello"
hello goodbye
1 hi bye
2 hola ciao
Note: all columns in df2
must exist in df1
, but not all columns in df1
have to be in df2
.
For additional behavior, there are other row_*
options. For example, you could use row_upsert
which will overwrite the values if they exist already, otherwise it will insert them:
df2 <- data.frame(hello = c("hi", "hola"), goodbye = c("goodbye", "ciao"))
library(dplyr)
df1 %>%
rows_upsert(df2)
Matching, by = "hello"
hello goodbye
1 hi goodbye # bye updated to goodbye since "hi" was already in data frame
2 hola ciao # inserted because "hola" was not in the data frame
These functions work by matching key columns. If the by
argument is not specified then the default behavior is to match the first column in the second data frame (df2
in this example) to the first data frame (df1
in this example).