Check the total content size of a tar gz file

GzipTar

Gzip Problem Overview


How can I extract the size of the total uncompressed file data in a .tar.gz file from command line?

Gzip Solutions


Solution 1 - Gzip

This works for any file size:

zcat archive.tar.gz | wc -c

For files smaller than 4Gb you could also use the -l option with gzip:

$ gzip -l compressed.tar.gz
     compressed        uncompressed  ratio uncompressed_name
            132               10240  99.1% compressed.tar

Solution 2 - Gzip

This will sum the total content size of the extracted files:

$ tar tzvf archive.tar.gz | sed 's/ \+/ /g' | cut -f3 -d' ' | sed '2,$s/^/+ /' | paste -sd' ' | bc

The output is given in bytes.

Explanation: tar tzvf lists the files in the archive in verbose format like ls -l. sed and cut isolate the file size field. The second sed puts a + in front of every size except the first and paste concatenates them, giving a sum expression that is then evaluated by bc.

Note that this doesn't include metadata, so the disk space taken up by the files when you extract them is going to be larger - potentially many times larger if you have a lot of very small files.

Solution 3 - Gzip

The command gzip -l archive.tar.gz doesn't work correctly with file sizes greater than 2Gb. I would recommend zcat archive.tar.gz | wc --bytes instead for really large files.

Solution 4 - Gzip

I know this is an old answer; but I wrote a tool just for this two years ago. It’s called gzsize and it gives you the uncompressed size of a gzip'ed file without actually decompressing the whole file on disk:

$ gzsize <your file>

Solution 5 - Gzip

Use the following command:

tar -xzf archive.tar.gz --to-stdout|wc -c

Solution 6 - Gzip

I'm finding everything sites in the web, and don't resolve this problem the get size when file size is bigger of 4GB.

first, which is most faster?


[oracle@base tmp]$ time zcat oracle.20180303.030001.dmp.tar.gz | wc -c
6667028480



real    0m45.761s
user    0m43.203s
sys     0m5.185s



[oracle@base tmp]$ time gzip -dc oracle.20180303.030001.dmp.tar.gz | wc -c
6667028480



real    0m45.335s
user    0m42.781s
sys     0m5.153s



[oracle@base tmp]$ time tar -tvf oracle.20180303.030001.dmp.tar.gz
-rw-r--r-- oracle/oinstall 111828 2018-03-03 03:05 oracle.20180303.030001.log
-rw-r----- oracle/oinstall 6666911744 2018-03-03 03:05 oracle.20180303.030001.dmp



real    0m46.669s
user    0m44.347s
sys     0m4.981s


definitely, tar -xvf is the most faster, but ¿how to cancel executions after get header?

my solution is this:




[oracle@base tmp]$  time echo $(timeout --signal=SIGINT 1s tar -tvf oracle.20180303.030001.dmp.tar.gz | awk '{print $3}') | grep -o '[[:digit:]]*' | awk '{ sum += $1 } END { print sum }'
6667023572



real    0m1.005s
user    0m0.013s
sys     0m0.066s


Solution 7 - Gzip

A tar file is uncompressed until/unless it is filtered through another program, such as gzip, bzip2, lzip, compress, lzma, etc. The file size of the tar file is the same as the extracted files, with probably less than 1kb of header info added in to make it a valid tarball.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionZtyxView Question on Stackoverflow
Solution 1 - GzipMatthew MottView Answer on Stackoverflow
Solution 2 - GzipZtyxView Answer on Stackoverflow
Solution 3 - GzipswdevView Answer on Stackoverflow
Solution 4 - GzipbfontaineView Answer on Stackoverflow
Solution 5 - Gzipelec3647View Answer on Stackoverflow
Solution 6 - GzipRaZieRSarEView Answer on Stackoverflow
Solution 7 - GzipTom SView Answer on Stackoverflow