unzipping file results in "BadZipFile: File is not a zip file"

PythonZip

Python Problem Overview


I have two zip files, both of them open well with Windows Explorer and 7-zip.

However when i open them with Python's zipfile module [ zipfile.ZipFile("filex.zip") ], one of them gets opened but the other one gives error "BadZipfile: File is not a zip file".

I've made sure that the latter one is a valid Zip File by opening it with 7-Zip and looking at its properties (says 7Zip.ZIP). When I open the file with a text editor, the first two characters are "PK", showing that it is indeed a zip file.

I'm using Python 2.5 and really don't have any clue how to go about for this. I've tried it both with Windows as well as Ubuntu and problem exists on both platforms.

Update: Traceback from Python 2.5.4 on Windows:

Traceback (most recent call last):
File "<module1>", line 5, in <module>
    zipfile.ZipFile("c:/temp/test.zip")
File "C:\Python25\lib\zipfile.py", line 346, in init
    self._GetContents()
File "C:\Python25\lib\zipfile.py", line 366, in _GetContents
    self._RealGetContents()
File "C:\Python25\lib\zipfile.py", line 378, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
BadZipfile: File is not a zip file

Basically when the _EndRecData function is called for getting data from End of Central Directory" record, the comment length checkout fails [ endrec[7] == len(comment) ].

The values of locals in the _EndRecData function are as following:

 END_BLOCK: 4096,
 comment: '\x00',
 data: '\xd6\xf6\x03\x00\x88,N8?<e\xf0q\xa8\x1cwK\x87\x0c(\x82a\xee\xc61N\'1qN\x0b\x16K-\x9d\xd57w\x0f\xa31n\xf3dN\x9e\xb1s\xffu\xd1\.....', (truncated)
 endrec: ['PK\x05\x06', 0, 0, 4, 4, 268, 199515, 0],
 filesize: 199806L,
 fpin: <open file 'c:/temp/test.zip', mode 'rb' at 0x045D4F98>,
 start: 4073

Python Solutions


Solution 1 - Python

files named file can confuse python - try naming it something else. if it STILL wont work, try this code:

def fixBadZipfile(zipFile):  
 f = open(zipFile, 'r+b')  
 data = f.read()  
 pos = data.find('\x50\x4b\x05\x06') # End of central directory signature  
 if (pos > 0):  
     self._log("Trancating file at location " + str(pos + 22)+ ".")  
     f.seek(pos + 22)   # size of 'ZIP end of central directory record' 
     f.truncate()  
     f.close()  
 else:  
     # raise error, file is truncated  

Solution 2 - Python

I run into the same issue. My problem was that it was a gzip instead of a zip file. I switched to the class gzip.GzipFile and it worked like a charm.

Solution 3 - Python

astronautlevel's solution works for most cases, but the compressed data and CRCs in the Zip can also contain the same 4 bytes. You should do an rfind (not find), seek to pos+20 and then add write \x00\x00 to the end of the file (tell zip applications that the length of the 'comments' section is 0 bytes long).


# HACK: See http://bugs.python.org/issue10694
# The zip file generated is correct, but because of extra data after the 'central directory' section,
# Some version of python (and some zip applications) can't read the file. By removing the extra data,
# we ensure that all applications can read the zip without issue.
# The ZIP format: http://www.pkware.com/documents/APPNOTE/APPNOTE-6.3.0.TXT
# Finding the end of the central directory:
#   http://stackoverflow.com/questions/8593904/how-to-find-the-position-of-central-directory-in-a-zip-file
#   http://stackoverflow.com/questions/20276105/why-cant-python-execute-a-zip-archive-passed-via-stdin
#       This second link is only losely related, but echos the first, "processing a ZIP archive often requires backwards seeking"
content = zipFileContainer.read()
pos = content.rfind('\x50\x4b\x05\x06') # reverse find: this string of bytes is the end of the zip's central directory.
if pos>0:
zipFileContainer.seek(pos+20) # +20: see secion V.I in 'ZIP format' link above.
zipFileContainer.truncate()
zipFileContainer.write('\x00\x00') # Zip file comment length: 0 byte length; tell zip applications to stop reading.
zipFileContainer.seek(0)



return zipFileContainer</code></pre>


Solution 4 - Python

I had the same problem and was able to solve this issue for my files, see my answer at https://stackoverflow.com/questions/4923142/zipfile-cant-handle-some-type-of-zip-data

Solution 5 - Python

Show the full traceback that you got from Python -- this may give a hint as to what the specific problem is. Unanswered: What software produced the bad file, and on what platform?

Update: Traceback indicates having problem detecting the "End of Central Directory" record in the file -- see function _EndRecData starting at line 128 of C:\Python25\Lib\zipfile.py

Suggestions:
(1) Trace through the above function
(2) Try it on the latest Python
(3) Answer the question above.
(4) Read this and anything else found by google("BadZipfile: File is not a zip file") that appears to be relevant

Solution 6 - Python

Sometime there are zip file which contain corrupted files and upon unzipping the zip gives badzipfile error. but there are tools like 7zip winrar which ignores these errors and successfully unzip the zip file. you can create a sub process and use this code to unzip your zip file without getting BadZipFile Error.

import subprocess
ziploc = "C:/Program Files/7-Zip/7z.exe" #location where 7zip is installed
cmd = [ziploc, 'e',your_Zip_file.zip ,'-o'+ OutputDirectory ,'-r' ] 
sp = subprocess.Popen(cmd, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)

Solution 7 - Python

Have you tried a newer python, or if that is too much trouble, simply a newer zipfile.py? I have successfully used a copy of zipfile.py from Python 2.6.2 (latest at the time) with Python 2.5 in order to open some zip files that weren't supported by Py2.5s zipfile module.

Solution 8 - Python

I faced this problem and was looking for a good and clean solution; But there was no solution until I found this answer. I had the same problem that @marsl (among the answers) had. It was a gzipfile instead of a zipfile in my case.

I could unarchive and decompress my gzipfile with this approach:

with tarfile.open(archive_path, "r:gz") as gzip_file:
    gzip_file.extractall()

Solution 9 - Python

In some cases, you have to confirm if the zip file is actually in gzip format. this was the case for me and i solved it by :

import requests
import tarfile
url = ".tar.gz link"
response = requests.get(url, stream=True)
file = tarfile.open(fileobj=response.raw, mode="r|gz")
file.extractall(path=".")

Solution 10 - Python

In my case, the zip file was corrupted. I was trying to download the zip file with urllib.request.urlretrieve but the file wouldn't completely download for some reason.

I connected to a VPN, the file downloaded just fine, and I was able to open the file.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionsharjeelView Question on Stackoverflow
Solution 1 - PythonastronautlevelView Answer on Stackoverflow
Solution 2 - PythonmarslView Answer on Stackoverflow
Solution 3 - PythonUltramaticOrangeView Answer on Stackoverflow
Solution 4 - PythonUri CohenView Answer on Stackoverflow
Solution 5 - PythonJohn MachinView Answer on Stackoverflow
Solution 6 - PythonvsnaharView Answer on Stackoverflow
Solution 7 - PythonBaffe BoyoisView Answer on Stackoverflow
Solution 8 - PythonMohammad JafariView Answer on Stackoverflow
Solution 9 - PythonGlen AgakView Answer on Stackoverflow
Solution 10 - PythonPaulView Answer on Stackoverflow