How to fix spaces in column names of a data.frame (remove spaces, inject dots)?

R Problem Overview

After importing a file, I always try try to remove spaces from the column names to make referral to column names easier.

Is there a better way to do this other then using transform and then removing the extra column this command creates?

This is what I use now:

names(ctm2)
#tranform function does this, but requires some action
ctm2<-transform(ctm2,dymmyvar=1)
#remove dummy column
ctm2$dymmyvar <- NULL
names(ctm2)

R Solutions

Solution 1 - R

There exists more elegant and general solution for that purpose:

tidy.name.vector <- make.names(name.vector, unique=TRUE)

make.names() makes syntactically valid names out of character vectors. A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number.

Additionally, flag unique=TRUE allows you to avoid possible dublicates in new column names.

As code to implement

d<-read_delim(urltxt,delim='\t',)
names(d)<-make.names(names(d),unique = TRUE)

Solution 2 - R

There is a very useful package for that, called janitor that makes cleaning up column names very simple. It removes all unique characters and replaces spaces with _.

library(janitor)

#can be done by simply
ctm2 <- clean_names(ctm2)

#or piping through `dplyr`
ctm2 <- ctm2 %>%
        clean_names()

Solution 3 - R

To replace only the first space in each column you could also do:

names(ctm2) <- sub(" ", ".", names(ctm2))

or to replace all spaces (which seems like it would be a little more useful):

names(ctm2) <- gsub(" ", "_", names(ctm2))

or, as mentioned in the first answer (though not in a way that would fix all spaces):

spaceless <- function(x) {colnames(x) <- gsub(" ", "_", colnames(x));x}
newDF <- spaceless(ctm2)

where x is the name of your data.frame. I prefer to use "_" to avoid issues with "." as part of an ID.

The point is that gsub doesn't stop at the first instance of a pattern match.

Solution 4 - R

Assign the names like this. This works best. It replaces all white spaces in the name with underscore.

names(ctm2)<-gsub("\\s","_",names(ctm2))

Solution 5 - R

dplyr::select_all() can be used to reformat column names. This example replaces spaces and periods with an underscore and converts everything to lower case:

iris %>%  
  select_all(~gsub("\\s+|\\.", "_", .)) %>% 
  select_all(tolower) %>% 
  head(2)
  sepal_length sepal_width petal_length petal_width species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa

Solution 6 - R

best solution I found so far is

names(ctm2) %<>% stringr::str_replace_all("\\s","_") %>% tolower

credit goes to commenters and other answers

Solution 7 - R

It's often convenient to change the names of your columns within one chunk of dplyr code rather than renaming the columns after you've created the data frame. Piping in rename_all() is very useful in these situations:

ctm2 %>% rename_all(function(x) gsub(" ", "_", x))

The code above will replace all spaces in every column name with an underscore.

Solution 8 - R

Alternatively, you may be able to achieve the same results with the stringr package.

names(ctm2) <- names(ctm2) %>% stringr::str_replace_all("\\s","_")

Solution 9 - R

as of Jan 2021: drplyr solution that is brief and uses no extra libraries is

df %<>% dplyr::rename_all(make.names)

credit goes to commenter.

Solution 10 - R

There is an easy way to remove spaces in column names in data.table. You will have to convert your data frame to data table.

setnames(x=DT, old=names(DT), new=gsub(" ","",names(DT)))

Country Code will be converted to CountryCode

Solution 11 - R

Just assign to names(ctm2):

  names(ctm2) <- c("itsy", "bitsy", "eeny", "meeny")

or in data-driven way:

  names(ctm2) <- paste("myColumn", 1:ncol(ctm2), sep="")

Another possibility is to edit your source file...

Solution 12 - R

You can also use combination of make names and gsub functions in R.

names(ctm2)<- gsub("\\.","_", make.names(names(ctm2), unique = T))

Above code will do 2 things at a time:

It will create unique names for all columns - for e.g. same names will be converted to unique e.g. c("ab","ab") will be converted to c("ab","ab2")
It will replace dots with Underscores. it becomes easy (just double click on name) when you try to select column name which has underscore as compared to column names with dots. selecting column names with dots is very difficult.

Solution 13 - R

If you use read.csv() to import your data (which replaces all spaces " " with ".") you can replace these instead with an underscore "_" using:

names(df) <- gsub("\\.", "_", names(df))

Content Type	Original Author	Original Content on Stackoverflow
Question	userJT	View Question on Stackoverflow
Solution 1 - R	Convex	View Answer on Stackoverflow
Solution 2 - R	camnesia	View Answer on Stackoverflow
Solution 3 - R	johannes	View Answer on Stackoverflow
Solution 4 - R	Gucci148	View Answer on Stackoverflow
Solution 5 - R	sbha	View Answer on Stackoverflow
Solution 6 - R	userJT	View Answer on Stackoverflow
Solution 7 - R	elamps	View Answer on Stackoverflow
Solution 8 - R	Gucci148	View Answer on Stackoverflow
Solution 9 - R	userJT	View Answer on Stackoverflow
Solution 10 - R	Main	View Answer on Stackoverflow
Solution 11 - R	Dirk Eddelbuettel	View Answer on Stackoverflow
Solution 12 - R	Harshal Gajare	View Answer on Stackoverflow
Solution 13 - R	user10358056	View Answer on Stackoverflow