Decompress gz file using R
RGzipR Problem Overview
I have used ?unzip
in the past to get at contents of a zipped file using R. This time around, I am having a hard time extracting the files from a .gz file which can be found here.
I have tried ?gzfile
and ?gzcon
but have not been able to get it to work. Any help you can provide will be greatly appreciated.
R Solutions
Solution 1 - R
Here is a worked example that may help illustrate what gzfile()
and gzcon()
are for
foo <- data.frame(a=LETTERS[1:3], b=rnorm(3))
foo
# a b
#1 A 0.586882
#2 B 0.218608
#3 C 1.290776
write.table(foo, file="/tmp/foo.csv")
system("gzip /tmp/foo.csv") # being very explicit
Now that the file is written, instead of implicit use of file()
, use gzfile()
:
read.table(gzfile("/tmp/foo.csv.gz"))
# a b
#1 A 0.586882
#2 B 0.218608
#3 C 1.290776
The file you point is a compressed tar archive, and as far as I know, R itself has no interface to tar archives. These are commonly used to distribute source code--as for example for R packages and R sources.
Solution 2 - R
To un-gz a file in R you can do
library(R.utils)
gunzip("file.gz", remove=FALSE)
or
gunzip("file.gz")
But then you get the default (remove=TRUE) behavior in which the input file is removed after that the output file is fully created and closed.
Solution 3 - R
If you really want to uncompress the file, just use the untar
function which does support gzip.
E.g.:
untar('chadwick-0.5.3.tar.gz')
Solution 4 - R
http://blog.revolutionanalytics.com/2009/12/r-tip-save-time-and-space-by-compressing-data-files.html
R added transparent decompression for certain kinds of compressed files in the latest version (2.10). If you have your files compressed with bzip2, xvz, or gzip they can be read into R as if they are plain text files. You should have the proper filename extensions.
The command...
myData <- read.table('myFile.gz')
#gzip compressed files have a "gz" extension
Will work just as if 'myFile.gz' were the raw text file.
Solution 5 - R
library(vroom)
columns3 = c('A', 'B',...) ## define column names
Data1<- vroom(".../XXX.tsv",col_names = columns3)
works fine with tsv.gz
Solution 6 - R
If it's a comma/tab-separated file, you can use data.table's fread()
. It can handle zipped (.zip, .gz) files:
fread('myFile.csv.gz')