Get filename without extension in R
RR FaqR Problem Overview
I have a file:
ABCD.csv
The length before the .csv
is not fixed and vary in any length.
How can I extract the portion before the .csv
?
R Solutions
Solution 1 - R
There's a built in file_path_sans_ext
from the standard install tools package that grabs the file without the extension.
tools::file_path_sans_ext("ABCD.csv")
## [1] "ABCD"
Solution 2 - R
basename
will also remove the path leading to the file. And with this regex, any extension will be removed.
filepath <- "d:/Some Dir/ABCD.csv"
sub(pattern = "(.*)\\..*$", replacement = "\\1", basename(filepath))
# [1] "ABCD"
Or, using file_path_sans_ext
as Tyler Rinker suggested:
file_path_sans_ext(basename(filepath))
# [1] "ABCD"
Solution 3 - R
You can use sub
or substr
sub('\\.csv$', '', str1)
#[1] "ABCD"
or
substr(str1, 1, nchar(str1)-4)
#[1] "ABCD"
Using the 'file_path' from @JasonV's post
sub('\\..*$', '', basename(filepath))
#[1] "ABCD"
Or
library(stringr)
str_extract(filepath, perl('(?<=[/])([^/]+)(?=\\.[^.]+)'))
#[1] "ABCD"
###data
str1 <- 'ABCD.csv'
Solution 4 - R
You can try this also:
data <- "ABCD.csv"
gsub(pattern = "\\.csv$", "", data)
#[1] "ABCD"
This will be helpful in case of list of files as well, say
data <- list.files(pattern="\\.csv$")
, using the code will remove extension of all the files in the list.
Solution 5 - R
If you have filenames with multiple (possible extensions) and you want to strip off only the last extension, you can try the following.
Consider the filename foo.bar.baz.txt
this
sub('\\..[^\\.]*$', '', "foo.bar.baz.txt")
will leave you with foo.bar.baz
.
Solution 6 - R
fs::path_ext_remove()
"removes the last extension and returns the rest of the path".
fs::path_ext_remove(c("ABCD.csv", "foo.bar.baz.txt", "d:/Some Dir/ABCD.csv"))
# Produces: [1] "ABCD" "foo.bar.baz" "D:/Some Dir/ABCD"
Solution 7 - R
Here is an implementation that works for compression and multiple files:
remove.file_ext <- function(path, basename = FALSE) {
out <- c()
for (p in path) {
fext <- file_ext(path)
compressions <- c("gzip", "gz", "bgz", "zip")
areCompressed <- fext %in% compressions
if (areCompressed) {
ext <- file_ext(file_path_sans_ext(path, compression = FALSE))
regex <- paste0("*\\.",ext,"\\.", fext,"$")
} else {
regex <- paste0("*\\.",fext,"$")
}
new <- gsub(pattern = regex, "", path)
out <- c(out, new)
}
return(ifelse(basename, basename(out), out))
}
Solution 8 - R
Loading the library needed :
> library(stringr)
Extracting all the matches from the regex:
> str_match("ABCD.csv", "(.*)\\..*$")
[,1] [,2]
[1,] "ABCD.csv" "ABCD"
Returning only the second part of the result, which corresponds to the group matching the file name:
> str_match("ABCD.csv", "(.*)\\..*$")[,2]
[1] "ABCD"
EDIT for @U-10-Forward:
It is basically the same principle as the other answer. Just that I found this solution more robust.
Regex wise it means:
-
() = group
-
.* = any single character except the newline character any number of time
-
// is escape notation, thus //. means literally "."
-
.* = any characters any number of time again
-
$ means should be at the end of the input string
The logic is then that it will return the group preceding a "." followed by a group of characters at the end of the string (which equals the file extension in this case).