Reading csv zipped files in python

Python 2.7CsvZip

Python 2.7 Problem Overview


I'm trying to get data from a zipped csv file. Is there a way to do this without unzipping the whole files? If not, how can I unzip the files and read them efficiently?

Python 2.7 Solutions


Solution 1 - Python 2.7

I used the zipfile module to import the ZIP directly to pandas dataframe. Let's say the file name is "intfile" and it's in .zip named "THEZIPFILE":

import pandas as pd
import zipfile

zf = zipfile.ZipFile('C:/Users/Desktop/THEZIPFILE.zip') 
df = pd.read_csv(zf.open('intfile.csv'))

Solution 2 - Python 2.7

If you aren't using Pandas it can be done entirely with the standard lib. Here is Python 3.7 code:

import csv
from io import TextIOWrapper
from zipfile import ZipFile

with ZipFile('yourfile.zip') as zf:
    with zf.open('your_csv_inside_zip.csv', 'r') as infile:
        reader = csv.reader(TextIOWrapper(infile, 'utf-8'))
        for row in reader:
            # process the CSV here
            print(row)

Solution 3 - Python 2.7

A quick solution can be using below code!

import pandas as pd

#pandas support zip file reads
df = pd.read_csv("/path/to/file.csv.zip")

Solution 4 - Python 2.7

zipfile also supports the with statement.

So adding onto yaron's answer of using pandas:

with zipfile.ZipFile('file.zip') as zip:
	with zip.open('file.csv') as myZip:
		df = pd.read_csv(myZip) 

Solution 5 - Python 2.7

Thought Yaron had the best answer but thought I would add a code that iterated through multiple files inside a zip folder. It will then append the results:

import os
import pandas as pd
import zipfile

curDir = os.getcwd()
zf = zipfile.ZipFile(curDir + '/targetfolder.zip')
text_files = zf.infolist()
list_ = []

print ("Uncompressing and reading data... ")

for text_file in text_files:
    print(text_file.filename)
    df = pd.read_csv(zf.open(text_file.filename)
    # do df manipulations
    list_.append(df)

df = pd.concat(list_)

Solution 6 - Python 2.7

Yes. You want the module 'zipfile'

You open the zip file itself with zipfile.ZipInfo([filename[, date_time]])

You can then use ZipFile.infolist() to enumerate each file within the zip, and extract it with ZipFile.open(name[, mode[, pwd]])

Solution 7 - Python 2.7

Modern Pandas since version 0.18.1 natively supports compressed csv files: its read_csv method has compression parameter : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Solution 8 - Python 2.7

this is the simplest thing I always use.

import pandas as pd
df = pd.read_csv("Train.zip",compression='zip')

Solution 9 - Python 2.7

Supposing you are downloading a zip file that contains a CSV and you don't want to use temporary storage. Here is what a sample implementation looks like:

#!/usr/bin/env python3

from csv import DictReader
from io import TextIOWrapper, BytesIO
from zipfile import ZipFile

import requests

def all_tickers():
    url = "https://simfin.com/api/bulk/bulk.php?dataset=industries&variant=null"
    r = requests.get(url)
    zip_ref = ZipFile(BytesIO(r.content))
    for name in zip_ref.namelist():
        print(name)
        with zip_ref.open(name) as file_contents:
            reader = DictReader(TextIOWrapper(file_contents, 'utf-8'), delimiter=';')
            for item in reader:
                print(item)

This takes care of all python3 bytes/str issues.

Solution 10 - Python 2.7

If you have a file name: my_big_file.csv and you zip it with the same name my_big_file.zip

you may simply do this:

df = pd.read_csv("my_big_file.zip")

Note: check your pandas version first (not applicable for older versions)

enter image description here

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionElyza AgostaView Question on Stackoverflow
Solution 1 - Python 2.7YaronView Answer on Stackoverflow
Solution 2 - Python 2.7volker238View Answer on Stackoverflow
Solution 3 - Python 2.7Hari PrasadView Answer on Stackoverflow
Solution 4 - Python 2.7gaius_baltarView Answer on Stackoverflow
Solution 5 - Python 2.7Arthur D. HowlandView Answer on Stackoverflow
Solution 6 - Python 2.7brycemView Answer on Stackoverflow
Solution 7 - Python 2.7Anatoly AlekseevView Answer on Stackoverflow
Solution 8 - Python 2.7sandeepnaidu gottapuView Answer on Stackoverflow
Solution 9 - Python 2.7hughdbrownView Answer on Stackoverflow
Solution 10 - Python 2.7adhgView Answer on Stackoverflow