R Reading in a zip data file without unzipping it

R

R Problem Overview


I have a very large zip file and i am trying to read it into R without unzipping it like so:

temp <- tempfile("Sales", fileext=c("zip"))
data <- read.table(unz(temp, "Sales.dat"), nrows=10, header=T, quote="\"", sep=",")

Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
  cannot open zip file 'C:\Users\xxx\AppData\Local\Temp\RtmpyAM9jH\Sales13041760345azip'

R Solutions


Solution 1 - R

If your zip file is called Sales.zip and contains only a file called Sales.dat, I think you can simply do the following (assuming the file is in your working directory):

data <- read.table(unz("Sales.zip", "Sales.dat"), nrows=10, header=T, quote="\"", sep=",")

Solution 2 - R

No need to use unz, as now read.table can handle the zipped file directly:

data <- read.table("Sales.zip", nrows=10, header=T, quote="\"", sep=",")

See this post

Solution 3 - R

The methods of the readr package also support compressed files if the file suffix indicates the nature of the file, that is files ending in .gz, .bz2, .xz, or .zip will be automatically uncompressed.

require(readr)
myData <- read_csv("foo.txt.gz")

Solution 4 - R

This should work just fine if the file is sales.csv.

data <- readr::read_csv(unzip("Sales.zip", "Sales.csv"))

To check the filename without extracting the file. This works

unzip("sales.zip", list = TRUE)

Solution 5 - R

If you have zcat installed on your system (which is the case for linux, macos, and cygwin) you could also use:

zipfile<-"test.zip"
myData <- read.delim(pipe(paste("zcat", zipfile)))

This solution also has the advantage that no temporary files are created.

Solution 6 - R

In this expression you lost a dot

temp <- tempfile("Sales", fileext=c("zip"))

It should be:

temp <- tempfile("Sales", fileext=c(".zip"))

Solution 7 - R

The gzfile function along with read_csv and read.table can read compressed files.

library(readr)
df = read_csv(gzfile("file.csv.gz"))

library(data.table)
df = read.table(gzfile("file.csv.gz"))

read_csv from the readr package can read compressed files even without using gzfile function.

library(readr)  
df = read_csv("file.csv.gz")

read_csv is recommended because it is faster than read.table

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionlaiboonhView Question on Stackoverflow
Solution 1 - RplannapusView Answer on Stackoverflow
Solution 2 - Ruser5496072View Answer on Stackoverflow
Solution 3 - RHolger BrandlView Answer on Stackoverflow
Solution 4 - RSmart DView Answer on Stackoverflow
Solution 5 - RHolger BrandlView Answer on Stackoverflow
Solution 6 - RJorge MorenoView Answer on Stackoverflow
Solution 7 - RNatheer AlabsiView Answer on Stackoverflow