Python in-memory zip library

PythonMemoryZipArchive

Python Problem Overview


Is there a Python library that allows manipulation of zip archives in memory, without having to use actual disk files?

The ZipFile library does not allow you to update the archive. The only way seems to be to extract it to a directory, make your changes, and create a new zip from that directory. I want to modify zip archives without disk access, because I'll be downloading them, making changes, and uploading them again, so I have no reason to store them.

Something similar to Java's ZipInputStream/ZipOutputStream would do the trick, although any interface at all that avoids disk access would be fine.

Python Solutions


Solution 1 - Python

According to the Python docs:

class zipfile.ZipFile(file[, mode[, compression[, allowZip64]]])

  Open a ZIP file, where file can be either a path to a file (a string) or a file-like object. 

So, to open the file in memory, just create a file-like object (perhaps using BytesIO).

file_like_object = io.BytesIO(my_zip_data)
zipfile_ob = zipfile.ZipFile(file_like_object)

Solution 2 - Python

PYTHON 3

import io
import zipfile

zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, "a", zipfile.ZIP_DEFLATED, False) as zip_file:
    for file_name, data in [('1.txt', io.BytesIO(b'111')), ('2.txt', io.BytesIO(b'222'))]:
        zip_file.writestr(file_name, data.getvalue())
with open('C:/1.zip', 'wb') as f:
    f.write(zip_buffer.getvalue())

Solution 3 - Python

From the article In-Memory Zip in Python:

> Below is a post of mine from May of 2008 on zipping in memory with Python, re-posted since Posterous is shutting down. > > I recently noticed that there is a for-pay component available to zip files in-memory with Python. Considering this is something that should be free, I threw together the following code. It has only gone through very basic testing, so if anyone finds any errors, let me know and I’ll update this.

import zipfile
import StringIO

class InMemoryZip(object):
    def __init__(self):
        # Create the in-memory file-like object
        self.in_memory_zip = StringIO.StringIO()

    def append(self, filename_in_zip, file_contents):
        '''Appends a file with name filename_in_zip and contents of 
        file_contents to the in-memory zip.'''
        # Get a handle to the in-memory zip in append mode
        zf = zipfile.ZipFile(self.in_memory_zip, "a", zipfile.ZIP_DEFLATED, False)

        # Write the file to the in-memory zip
        zf.writestr(filename_in_zip, file_contents)

        # Mark the files as having been created on Windows so that
        # Unix permissions are not inferred as 0000
        for zfile in zf.filelist:
            zfile.create_system = 0        

        return self

    def read(self):
        '''Returns a string with the contents of the in-memory zip.'''
        self.in_memory_zip.seek(0)
        return self.in_memory_zip.read()

    def writetofile(self, filename):
        '''Writes the in-memory zip to a file.'''
        f = file(filename, "w")
        f.write(self.read())
        f.close()

if __name__ == "__main__":
    # Run a test
    imz = InMemoryZip()
    imz.append("test.txt", "Another test").append("test2.txt", "Still another")
    imz.writetofile("test.zip")

Solution 4 - Python

The example Ethier provided has several problems, some of them major:

  • doesn't work for real data on Windows. A ZIP file is binary and its data should always be written with a file opened 'wb'
  • the ZIP file is appended to for each file, this is inefficient. It can just be opened and kept as an InMemoryZip attribute
  • the documentation states that ZIP files should be closed explicitly, this is not done in the append function (it probably works (for the example) because zf goes out of scope and that closes the ZIP file)
  • the create_system flag is set for all the files in the zipfile every time a file is appended instead of just once per file.
  • on Python < 3 cStringIO is much more efficient than StringIO
  • doesn't work on Python 3 (the original article was from before the 3.0 release, but by the time the code was posted 3.1 had been out for a long time).

An updated version is available if you install ruamel.std.zipfile (of which I am the author). After

pip install ruamel.std.zipfile

or including the code for the class from here, you can do:

import ruamel.std.zipfile as zipfile

# Run a test
zipfile.InMemoryZipFile()
imz.append("test.txt", "Another test").append("test2.txt", "Still another")
imz.writetofile("test.zip")  

You can alternatively write the contents using imz.data to any place you need.

You can also use the with statement, and if you provide a filename, the contents of the ZIP will be written on leaving that context:

with zipfile.InMemoryZipFile('test.zip') as imz:
    imz.append("test.txt", "Another test").append("test2.txt", "Still another")

because of the delayed writing to disc, you can actually read from an old test.zip within that context.

Solution 5 - Python

I am using Flask to create an in-memory zipfile and return it as a download. Builds on the example above from Vladimir. The seek(0) took a while to figure out.

import io
import zipfile

zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, "a", zipfile.ZIP_DEFLATED, False) as zip_file:
    for file_name, data in [('1.txt', io.BytesIO(b'111')), ('2.txt', io.BytesIO(b'222'))]:
        zip_file.writestr(file_name, data.getvalue())

zip_buffer.seek(0)
return send_file(zip_buffer, attachment_filename='filename.zip', as_attachment=True)

Solution 6 - Python

> I want to modify zip archives without disk access, because I'll be downloading them, making changes, and uploading them again, so I have no reason to store them

This is possible using the two libraries https://github.com/uktrade/stream-unzip and https://github.com/uktrade/stream-zip (full disclosure: written by me). And depending on the changes, you might not even have to store the entire zip in memory at once.

Say you just want to download, unzip, zip, and re-upload. Slightly pointless, but you could slot in some changes to the unzipped content:

from datetime import datetime
import httpx
from stream_unzip import stream_unzip
from stream_zip import stream_zip, ZIP_64

def get_source_bytes_iter(url):
    with httpx.stream('GET', url) as r:
        yield from r.iter_bytes()

def get_target_files(files):
    # stream-unzip doesn't expose perms or modified_at, but stream-zip requires them
    modified_at = datetime.now()
    perms = 0o600

    for name, _, chunks in files:
        # Could change name, manipulate chunks, skip a file, or yield a new file
        yield name.decode(), modified_at, perms, ZIP_64, chunks

source_url = 'https://source.test/file.zip'
target_url = 'https://target.test/file.zip'

source_bytes_iter = get_source_bytes_iter(source_url)
source_files = stream_unzip(source_bytes_iter)
target_files = get_target_files(source_files)
target_bytes_iter = stream_zip(target_files)

httpx.put(target_url, data=target_bytes_iter)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJohn BView Question on Stackoverflow
Solution 1 - PythonJason R. CoombsView Answer on Stackoverflow
Solution 2 - PythonVladimirView Answer on Stackoverflow
Solution 3 - PythonJustin EthierView Answer on Stackoverflow
Solution 4 - PythonAnthonView Answer on Stackoverflow
Solution 5 - PythonMolossusView Answer on Stackoverflow
Solution 6 - PythonMichal CharemzaView Answer on Stackoverflow