R define dimensions of empty data frame
RR Problem Overview
I am trying to collect some data from multiple subsets of a data set and need to create a data frame to collect the results. My problem is don't know how to create an empty data frame with defined number of columns without actually having data to put into it.
collect1 <- c() ## i'd like to create empty df w/ 3 columns: `id`, `max1` and `min1`
for(i in 1:10){
collect1$id <- i
ss1 <- subset(df1, df1$id == i)
collect1$max1 <- max(ss1$value)
collect1$min1 <- min(ss1$value)
}
I feel very dumb asking this question (I almost feel like I've asked it on SO before but can't find it) but would greatly appreciate any help.
R Solutions
Solution 1 - R
Would a dataframe of NA
s work?
something like:
data.frame(matrix(NA, nrow = 2, ncol = 3))
if you need to be more specific about the data type then may prefer: NA_integer_
, NA_real_
, NA_complex_
, or NA_character_
instead of just NA
which is logical
Something else that may be more specific that the NAs
is:
data.frame(matrix(vector(mode = 'numeric',length = 6), nrow = 2, ncol = 3))
where the mode can be of any type. See ?vector
Solution 2 - R
Just create a data frame of empty vectors:
collect1 <- data.frame(id = character(0), max1 = numeric(0), max2 = numeric(0))
But if you know how many rows you're going to have in advance, you should just create the data frame with that many rows to start with.
Solution 3 - R
You can do something like:
N <- 10
collect1 <- data.frame(id = integer(N),
max1 = numeric(N),
min1 = numeric(N))
Now be careful that in the rest of your code, you forgot to use the row index for filling the data.frame row by row. It should be:
for(i in seq_len(N)){
collect1$id[i] <- i
ss1 <- subset(df1, df1$id == i)
collect1$max1[i] <- max(ss1$value)
collect1$min1[i] <- min(ss1$value)
}
Finally, I would say that there are many alternatives for doing what you are trying to accomplish, some would be much more efficient and use a lot less typing. You could for example look at the aggregate
function, or ddply
from the plyr
package.
Solution 4 - R
You may use NULL
instead of NA
. This creates a truly empty data frame.
Solution 5 - R
df = data.frame(matrix("", ncol = 3, nrow = 10))
Solution 6 - R
Here a solution if you want an empty data frame with a defined number of rows and NO columns:
df = data.frame(matrix(NA, ncol=1, nrow=10)[-1]
Solution 7 - R
It might help the solution given in another forum, Basically is: i.e.
Cols <- paste("A", 1:5, sep="")
DF <- read.table(textConnection(""), col.names = Cols,colClasses = "character")
> str(DF)
'data.frame': 0 obs. of 5 variables:
$ A1: chr
$ A2: chr
$ A3: chr
$ A4: chr
$ A5: chr
You can change the colClasses to fit your needs.
Original link is <https://stat.ethz.ch/pipermail/r-help/2008-August/169966.html>
Solution 8 - R
A more general method to create an arbitrary size data frame is to create a n-by-1 data-frame from a matrix of the same dimension. Then, you can immediately drop the first row:
> v <- data.frame(matrix(NA, nrow=1, ncol=10))
> v <- v[-1, , drop=FALSE]
> v
[1] X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
<0 rows> (or 0-length row.names)
Solution 9 - R
If only the column names are available like :
cnms <- c("Nam1","Nam2","Nam3")
To create an empty data frame with the above variable names, first create a data.frame
object:
emptydf <- data.frame()
Now call zeroth element of every column, thus creating an empty data frame with the given variable names:
for( i in 1:length(cnms)){
emptydf[0,eval(cnms[i])]
}
Solution 10 - R
seq_along
may help to find out how many rows in your data file and create a data.frame with the desired number of rows
listdf <- data.frame(ID=seq_along(df),
var1=seq_along(df), var2=seq_along(df))
Solution 11 - R
I have come across the same problem and have a cleaner solution. Instead of creating an empty data.frame
you can instead save your data as a named list. Once you have added all results to this list you convert it to a data.frame after.
For the case of adding features one at a time this works best.
mylist = list()
for(column in 1:10) mylist$column = rnorm(10)
mydf = data.frame(mylist)
For the case of adding rows one at a time this becomes tricky due to mixed types. If all types are the same it is easy.
mylist = list()
for(row in 1:10) mylist$row = rnorm(10)
mydf = data.frame(do.call(rbind, mylist))
I haven't found a simple way to add rows of mixed types. In this case, if you must do it this way, the empty data.frame is probably the best solution.