How can I decompress a gzip stream with zlib?
GzipZlibInflateGzip Problem Overview
Gzip format files (created with the gzip
program, for example) use the "deflate" compression algorithm, which is the same compression algorithm as what zlib uses. However, when using zlib to inflate a gzip compressed file, the library returns a Z_DATA_ERROR
.
How can I use zlib to decompress a gzip file?
Gzip Solutions
Solution 1 - Gzip
To decompress a gzip format file with zlib, call inflateInit2
with the windowBits
parameter as 16+MAX_WBITS
, like this:
inflateInit2(&stream, 16+MAX_WBITS);
If you don't do this, zlib will complain about a bad stream format. By default, zlib creates streams with a zlib header, and on inflate does not recognise the different gzip header unless you tell it so. Although this is documented starting in version 1.2.1 of the zlib.h
header file, it is not in the zlib manual. From the header file:
> windowBits
can also be greater than 15 for optional gzip decoding. Add
32 to windowBits
to enable zlib and gzip decoding with automatic header
detection, or add 16 to decode only the gzip format (the zlib format will
return a Z_DATA_ERROR
). If a gzip stream is being decoded, strm->adler
is
a crc32 instead of an adler32.
Solution 2 - Gzip
python
- RFC 1950 (
zlib
compressed format) - RFC 1951 (
deflate
compressed format) - RFC 1952 (
gzip
compressed format)
The python zlib
module will support these as well.
choosing windowBits
But zlib
can decompress all those formats:
- to (de-)compress
deflate
format, usewbits = -zlib.MAX_WBITS
- to (de-)compress
zlib
format, usewbits = zlib.MAX_WBITS
- to (de-)compress
gzip
format, usewbits = zlib.MAX_WBITS | 16
See documentation in http://www.zlib.net/manual.html#Advanced (section inflateInit2
)
examples
test data:
>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS)
>>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS)
>>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
>>>
>>> text = '''test'''
>>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush()
>>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush()
>>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush()
>>>
obvious test for zlib
:
>>> zlib.decompress(zlib_data)
'test'
test for deflate
:
>>> zlib.decompress(deflate_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(deflate_data, -zlib.MAX_WBITS)
'test'
test for gzip
:
>>> zlib.decompress(gzip_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
'test'
the data is also compatible with gzip
module:
>>> import gzip
>>> import StringIO
>>> fio = StringIO.StringIO(gzip_data)
>>> f = gzip.GzipFile(fileobj=fio)
>>> f.read()
'test'
>>> f.close()
automatic header detection (zlib or gzip)
adding 32
to windowBits
will trigger header detection
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
'test'
>>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32)
'test'
gzip
instead
using For gzip
data with gzip header you can use gzip
module directly; but please remember that under the hood, gzip
uses zlib
.
fh = gzip.open('abc.gz', 'rb')
cdata = fh.read()
fh.close()