use first row data as column names in r

RNames

R Problem Overview


I have a dirty dataset that I could not read it with header = T. After I read and clean it, I would like to use the now first row data as the column name. I tried multiple methods on Stack Overflow without success. What could be the problem?

The dataset t1 should look like this after clean up:

      V1	V2	V3	V4	V5
1	col1	col2	col3	col4
2	row1	2	4	5	56
3	row2	74	74	3	534
4	row3	865	768	8	7
5	row4	68	86	65	87
  • I tried: colnames(t1) <- t1[1,]. Nothing happens.

  • I tried: names(t1) <- ti[1,], Nothing happens.

  • I tried: lapply(t1, function(x) {names(x) <- x[1, ]; x}). It returns an error message:

    Error in `[.default`(x, 1, ) : incorrect number of dimensions
    

Could anyone help?

R Solutions


Solution 1 - R

Sam Firke's ever useful package janitor has a function especially for this: row_to_names.

Example from his documentation:

library(janitor)

x <- data.frame(X_1 = c(NA, "Title", 1:3),
           X_2 = c(NA, "Title2", 4:6))
x %>%
  row_to_names(row_number = 2)

Solution 2 - R

header.true <- function(df) {
  names(df) <- as.character(unlist(df[1,]))
  df[-1,]
}

Test

df1 <- data.frame(c("a", 1,2,3), c("b", 4,5,6))
header.true(df1)
  a b
2 1 4
3 2 5
4 3 6

Solution 3 - R

Probably, the data type of the data frame columns are factors. That is why the code you tried didn't work, you can check it using str(df):

  • First option
  • > Use the argument stringsAsFactors = FALSEwhen you import your data:

    df <- read.table(text =  "V1    V2  V3  V4  V5
                            col1    col2    col3    col4 col5
                            row1    2   4   5   56
                            row2    74  74  3   534
                            row3    865 768 8   7
                            row4    68  86  65  87", header = TRUE, 
                            stringsAsFactors = FALSE )
    

    Then you can use your first attempt, then remove your first row if you'd like:

    colnames(df) <- df[1,]
    df <- df[-1, ] 
    

  • Second option
  • It will work if your columns are factors or characters:

    names(df) <- lapply(df[1, ], as.character)
    df <- df[-1,] 
    

    Output:

      col1 col2 col3 col4 col5
    2 row1    2    4    5   56
    3 row2   74   74    3  534
    4 row3  865  768    8    7
    5 row4   68   86   65   87
    

    Solution 4 - R

    While @sbha has already offered a tidyverse solution, I would like to leave a fully pipeable dplyr option. I agree that this should could be an incredibly useful function.

    library(dplyr)
    data.frame(x = c("a", 1, 2, 3), y = c("b", 4, 5, 6)) %>%
      `colnames<-`(.[1, ]) %>%
      .[-1, ]
    

    Solution 5 - R

    How about:

    my.names <- t1[1,]
    
    colnames(t1) <- my.names
    

    i.e. specifically naming the row as a variable?

    with the following code:

    namex <-c("col1","col2","col3","col4")
    row1 <- c(2, 4, 5, 56)
    row2 <- c(74, 73, 3, 534)
    row3 <- c(865, 768, 8, 7)
    row4 <- c(68, 58, 65, 87)
    
    t1 <- data.frame(namex, row1, row2, row3, row4)
    t1 <- t(t1)
    
    my.names <- t1[1,]
    
    colnames(t1) <- my.names
    

    It seems to work, but maybe I'm missing something?

    Solution 6 - R

    Take a step back, when you read your data use skip=1 in read.table to miss out the first line entirely. This should make life a bit easier when you're cleaning data, particularly for data type. This is key as your problem stems from your data being encoded as factor.

    You can then read in your column names separately with nrows=1 in read.table.

    Solution 7 - R

    Similar to some of the other answers, here is a dplyr/tidyverse option:

    library(tidyverse)
    
    names(df) <- df %>% slice(1) %>% unlist()
    df <- df %>% slice(-1)
    

    Solution 8 - R

    Using data.table:

    library(data.table)
    
    namex <-c("col1","col2","col3","col4")
    row1 <- c(2, 4, 5, 56)
    row2 <- c(74, 73, 3, 534)
    row3 <- c(865, 768, 8, 7)
    row4 <- c(68, 58, 65, 87)
    
    t1 <- data.table(namex, row1, row2, row3, row4)
    t1 <- data.table(t(t1))
    
    setnames(t1, as.character(t1[1,]))
    t1 <- t1[-1,]
    

    Solution 9 - R

    You almost did that, only missed calling a vector with c

    colnames(t1)=t1[c(1),]
    

    Then you can erase the first row, as now it is doubled

    t1=t1[-c(1),]
    

    Solution 10 - R

    Building off of Pierre L's answer. Sometimes the first row in a document ends up getting split into two or more rows when pulled into a data frame. This slight modification helped solve that for me.

    header.true <- function(df) {
      r1 <- as.character(unlist(df[1,]))
      r2 <- as.character(unlist(df[2,]))
      r1.2 <- paste(r1,r2, sep = ".")
      names(df) <- r1.2
      df[-c(1,2),]
    }
    

    Test

    df1 <- data.frame(c("a", "xx",1,2,3), c("b", "xx",4,5,6))
    header.true(df1)
      a.xx b.xx
    3    1    4
    4    2    5
    5    3    6
    

    Attributions

    All content for this solution is sourced from the original question on Stackoverflow.

    The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

    Content TypeOriginal AuthorOriginal Content on Stackoverflow
    QuestionsstwwView Question on Stackoverflow
    Solution 1 - Rzek19View Answer on Stackoverflow
    Solution 2 - RPierre LView Answer on Stackoverflow
    Solution 3 - RmpalancoView Answer on Stackoverflow
    Solution 4 - RKimView Answer on Stackoverflow
    Solution 5 - RmattbawnView Answer on Stackoverflow
    Solution 6 - RMikeRSpencerView Answer on Stackoverflow
    Solution 7 - RsbhaView Answer on Stackoverflow
    Solution 8 - RDMillanView Answer on Stackoverflow
    Solution 9 - RMarcusView Answer on Stackoverflow
    Solution 10 - RottehengView Answer on Stackoverflow