Extract files from zip without keeping the structure using python ZipFile?

PythonExtractUnzipPython Zipfile

Python Problem Overview


I try to extract all files from .zip containing subfolders in one folder. I want all the files from subfolders extract in only one folder without keeping the original structure. At the moment, I extract all, move the files to a folder, then remove previous subfolders. The files with same names are overwrited.

Is it possible to do it before writing files?

Here is a structure for example:

my_zip/file1.txt
my_zip/dir1/file2.txt
my_zip/dir1/dir2/file3.txt
my_zip/dir3/file4.txt
 

At the end I whish this:

my_dir/file1.txt
my_dir/file2.txt
my_dir/file3.txt
my_dir/file4.txt

What can I add to this code ?

import zipfile
my_dir = "D:\\Download\\"
my_zip = "D:\\Download\\my_file.zip"

zip_file = zipfile.ZipFile(my_zip, 'r')
for files in zip_file.namelist():
    zip_file.extract(files, my_dir)
zip_file.close()

if I rename files path from zip_file.namelist(), I have this error:

KeyError: "There is no item named 'file2.txt' in the archive"

Python Solutions


Solution 1 - Python

This opens file handles of members of the zip archive, extracts the filename and copies it to a target file (that's how ZipFile.extract works, without taking care of subdirectories).

import os
import shutil
import zipfile

my_dir = r"D:\Download"
my_zip = r"D:\Download\my_file.zip"

with zipfile.ZipFile(my_zip) as zip_file:
    for member in zip_file.namelist():
        filename = os.path.basename(member)
        # skip directories
        if not filename:
            continue
    
        # copy file (taken from zipfile's extract)
        source = zip_file.open(member)
        target = open(os.path.join(my_dir, filename), "wb")
        with source, target:
            shutil.copyfileobj(source, target)

Solution 2 - Python

It is possible to iterate over the ZipFile.infolist(). On the returned ZipInfo objects you can then manipulate the filename to remove the directory part and finally extract it to a specified directory.

import glob
import zipfile
import shutil
import os

my_dir = "D:\\Download\\"
my_zip = "D:\\Download\\my_file.zip"

with zipfile.ZipFile(my_zip) as zip:
    for zip_info in zip.infolist():
        if zip_info.filename[-1] == '/':
            continue
        zip_info.filename = os.path.basename(zip_info.filename)
        zip.extract(zip_info, my_dir)

Solution 3 - Python

Just extract to bytes in memory,compute the filename, and write it there yourself, instead of letting the library do it - -mostly, just use the "read()" instead of "extract()" method:

Python 3.6+ update(2020) - the same code from the original answer, but using pathlib.Path, which ease file-path manipulation and other operations (like "write_bytes")

from pathlib import Path
import zipfile
import os

my_dir = Path("D:\\Download\\")
my_zip = my_dir / "my_file.zip"

zip_file = zipfile.ZipFile(my_zip, 'r')
for files in zip_file.namelist():
    data = zip_file.read(files, my_dir)
    myfile_path = my_dir / Path(files.filename).name
    myfile_path.write_bytes(data)
zip_file.close()

Original code in answer without pathlib:

import zipfile
import os

my_dir = "D:\\Download\\"
my_zip = "D:\\Download\\my_file.zip"

zip_file = zipfile.ZipFile(my_zip, 'r')
for files in zip_file.namelist():
    data = zip_file.read(files, my_dir)
    # I am almost shure zip represents directory separator
    # char as "/" regardless of OS, but I  don't have DOS or Windos here to test it
    myfile_path = os.path.join(my_dir, files.split("/")[-1])
    myfile = open(myfile_path, "wb")
    myfile.write(data)
    myfile.close()
zip_file.close()

Solution 4 - Python

A similar concept to the solution of Gerhard Götz, but adapted for extracting single files instead of the entire zip:

with ZipFile(zipPath, 'r') as zipObj:
	zipInfo = zipObj.getinfo(path_in_zip))
	zipInfo.filename = os.path.basename(destination)
	zipObj.extract(zipInfo, os.path.dirname(os.path.realpath(destination)))

Solution 5 - Python

In case you are getting badZipFile error. you can unzip the archive using 7zip sub process. assuming you have installed the 7zip then use the following code.

import subprocess
my_dir = destFolder #destination folder
my_zip = destFolder + "/" + filename.zip #file you want to extract
ziploc = "C:/Program Files/7-Zip/7z.exe" #location where 7zip is installed
cmd = [ziploc, 'e',my_zip ,'-o'+ my_dir ,'*.txt' ,'-r' ] 
#extracting only txt files and from all subdirectories
sp = subprocess.Popen(cmd, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionThammasView Question on Stackoverflow
Solution 1 - PythonReiner GereckeView Answer on Stackoverflow
Solution 2 - PythonGerhard GötzView Answer on Stackoverflow
Solution 3 - PythonjsbuenoView Answer on Stackoverflow
Solution 4 - PythonL0laapk3View Answer on Stackoverflow
Solution 5 - PythonvsnaharView Answer on Stackoverflow