Compressing directory using shutil.make_archive() while preserving directory structure

PythonDirectoryZipShutil

Python Problem Overview


I'm trying to zip a directory called test_dicoms to a zip file named test_dicoms.zip using the following code:

shutil.make_archive('/home/code/test_dicoms', 'zip', '/home/code/test_dicoms')

The problem is that when I unzip it, all of the files that were in /test_dicoms/ are extracted to /home/code/ instead of the folder /test_dicoms/ and all of it's contained files being extracted to /home/code/. So /test_dicoms/ has a file called foo.txt and after I zip and unzip foo.txt's path is /home/code/foo.txt as opposed to /home/code/test_dicoms/foo.txt. How do I fix this? Also, some of the directories I'm working with are very large. Will I need to add anything to my code to make it ZIP64 or is the function smart enough to do that automatically?

Here's what's currently in the archive created:

[gwarner@jazz gwarner]$ unzip -l test_dicoms.zip
Archive: test_dicoms.zip
Length    Date       Time  Name
--------- ---------- ----- ----
    93324 09-17-2015 16:05 AAscout_b_000070
    93332 09-17-2015 16:05 AAscout_b_000125
    93332 09-17-2015 16:05 AAscout_b_000248

Python Solutions


Solution 1 - Python

Using the terms in the documentation, you have specified a root_dir, but not a base_dir. Try specifying the base_dir like so:

shutil.make_archive('/home/code/test_dicoms',
                    'zip',
                    '/home/code/',
                    'test_dicoms')

To answer your second question, it depends upon the version of Python you are using. Starting from Python 3.4, ZIP64 extensions will be availble by default. Prior to Python 3.4, make_archive will not automatically create a file with ZIP64 extensions. If you are using an older version of Python and want ZIP64, you can invoke the underlying zipfile.ZipFile() directly.

If you choose to use zipfile.ZipFile() directly, bypassing shutil.make_archive(), here is an example:

import zipfile
import os

d = '/home/code/test_dicoms'

os.chdir(os.path.dirname(d))
with zipfile.ZipFile(d + '.zip',
                     "w",
                     zipfile.ZIP_DEFLATED,
                     allowZip64=True) as zf:
    for root, _, filenames in os.walk(os.path.basename(d)):
        for name in filenames:
            name = os.path.join(root, name)
            name = os.path.normpath(name)
            zf.write(name, name)

Reference:

Solution 2 - Python

I have written a wrapper function myself because shutil.make_archive is too confusing to use.

Here it is http://www.seanbehan.com/how-to-use-python-shutil-make_archive-to-zip-up-a-directory-recursively-including-the-root-folder/

And just the code..

import os, shutil
def make_archive(source, destination):
        base = os.path.basename(destination)
        name = base.split('.')[0]
        format = base.split('.')[1]
        archive_from = os.path.dirname(source)
        archive_to = os.path.basename(source.strip(os.sep))
        shutil.make_archive(name, format, archive_from, archive_to)
        shutil.move('%s.%s'%(name,format), destination)

make_archive('/path/to/folder', '/path/to/folder.zip')

Solution 3 - Python

There are basically 2 approaches to using shutil: you may try to understand the logic behind it or you may just use an example. I couldn't find an example here so I tried to create my own.

;TLDR. Run shutil.make_archive('dir1_arc', 'zip', root_dir='dir1') or shutil.make_archive('dir1_arc', 'zip', base_dir='dir1') or just shutil.make_archive('dir1_arc', 'zip', 'dir1') from temp.

Suppose you have ~/temp/dir1:

temp $ tree dir1
dir1
├── dir11
│   ├── file11
│   ├── file12
│   └── file13
├── dir1_arc.zip
├── file1
├── file2
└── file3

How can you create an archive of dir1? Set base_name='dir1_arc', format='zip'. Well you have a lot of options:

  • cd into dir1 and run shutil.make_archive(base_name=base_name, format=format); it will create an archive dir1_arc.zip inside dir1; the only problem you'll get a strange behavior: inside your archive you'll find file dir1_arc.zip;
  • from temp run shutil.make_archive(base_name=base_name, format=format, base_dir='dir1'); you'll get dir1_arc.zip inside temp that you can unzip into dir1; root_dir defaults to temp;
  • from ~ run shutil.make_archive(base_name=base_name, format=format, root_dir='temp', base_dir='dir1'); you'll again get your file but this time inside ~ directory;
  • create another directory temp2 in ~ and run inside it: shutil.make_archive(base_name=base_name, format=format, root_dir='../temp', base_dir='dir1'); you'll get your archive in this temp2 folder;

Can you run shutil without specifying arguments? You can. Run from temp shutil.make_archive('dir1_arc', 'zip', 'dir1'). This is the same as run shutil.make_archive('dir1_arc', 'zip', root_dir='dir1'). What can we say about base_dir in this case? From documentation not so much. From source code we may see that:

if root_dir is not None:
  os.chdir(root_dir)

if base_dir is None:
        base_dir = os.curdir 

So in our case base_dir is dir1. And we can keep asking questions.

Solution 4 - Python

I was having issues with path split on some paths with '.' periods in them and i found having an optional format which defaults to 'zip' is handy and still allows you to override for other formats and is less error prone.

import os
import shutil
from shutil import make_archive

def make_archive(source, destination, format='zip'):
    import os
    import shutil
    from shutil import make_archive
    base, name = os.path.split(destination)
    archive_from = os.path.dirname(source)
    archive_to = os.path.basename(source.strip(os.sep))
    print(f'Source: {source}\nDestination: {destination}\nArchive From: {archive_from}\nArchive To: {archive_to}\n')
    shutil.make_archive(name, format, archive_from, archive_to)
    shutil.move('%s.%s' % (name, format), destination)

make_archive('/path/to/folder', '/path/to/folder.zip')

Special thanks to seanbehan's original answer or i would have been lost in the sauce much longer.

Solution 5 - Python

I think, I am able to improve seanbehan's answer by removing the file moving:

def make_archive(source, destination):
    base_name = '.'.join(destination.split('.')[:-1])
    format = destination.split('.')[-1]
    root_dir = os.path.dirname(source)
    base_dir = os.path.basename(source.strip(os.sep))
    shutil.make_archive(base_name, format, root_dir, base_dir)

Solution 6 - Python

This is a variation on @nick's answer that uses pathlib, type hinting and avoids shadowing builtins:

from pathlib import Path
import shutil

def make_archive(source: Path, destination: Path) -> None:
    base_name = destination.parent / destination.stem
    fmt = destination.suffix.replace(".", "")
    root_dir = source.parent
    base_dir = source.name
    shutil.make_archive(str(base_name), fmt, root_dir, base_dir)

Usage:

make_archive(Path("/path/to/dir/"), Path("/path/to/output.zip"))

Solution 7 - Python

This solution builds off the responses from irudyak and seanbehan and uses Pathlib. You need to pass source and destination as Path objects.

from pathlib import Path
import shutil

def make_archive(source, destination):
    base_name = destination.parent / destination.stem
    format = (destination.suffix).replace(".", "")
    root_dir = source.parent
    base_dir = source.name
    shutil.make_archive(base_name, format, root_dir, base_dir)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionG WarnerView Question on Stackoverflow
Solution 1 - PythonRobᵩView Answer on Stackoverflow
Solution 2 - PythonseanbehanView Answer on Stackoverflow
Solution 3 - PythonirudyakView Answer on Stackoverflow
Solution 4 - PythonMik RView Answer on Stackoverflow
Solution 5 - PythonMake42View Answer on Stackoverflow
Solution 6 - PythonphoenixView Answer on Stackoverflow
Solution 7 - PythonNickView Answer on Stackoverflow