How to check if a Unix .tar.gz file is a valid file without uncompressing?

GzipValidationTarGunzip

Gzip Problem Overview


I have found the question https://stackoverflow.com/questions/1788236/how-to-determine-if-data-is-valid-tar-file, but I was wondering: is there a ready made command line solution?

Gzip Solutions


Solution 1 - Gzip

What about just getting a listing of the tarball and throw away the output, rather than decompressing the file?

tar -tzf my_tar.tar.gz >/dev/null

Edited as per comment. Thanks zrajm!

Edit as per comment. Thanks Frozen Flame! This test in no way implies integrity of the data. Because it was designed as a tape archival utility most implementations of tar will allow multiple copies of the same file!

Solution 2 - Gzip

you could probably use the gzip -t option to test the files integrity

http://linux.about.com/od/commands/l/blcmdl1_gzip.htm

from: http://unix.ittoolbox.com/groups/technical-functional/shellscript-l/how-to-test-file-integrity-of-targz-1138880

To test the gzip file is not corrupt:

gunzip -t file.tar.gz

To test the tar file inside is not corrupt:

gunzip -c file.tar.gz | tar -t > /dev/null

As part of the backup you could probably just run the latter command and check the value of $? afterwards for a 0 (success) value. If either the tar or the gzip has an issue, $? will have a non zero value.

Solution 3 - Gzip

If you want to do a real test extract of a tar file without extracting to disk, use the -O option. This spews the extract to standard output instead of the filesystem. If the tar file is corrupt, the process will abort with an error.

Example of failed tar ball test...

$ echo "this will not pass the test" > hello.tgz
$ tar -xvzf hello.tgz -O > /dev/null
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors
$ rm hello.*

Working Example...

$ ls hello*
ls: hello*: No such file or directory
$ echo "hello1" > hello1.txt
$ echo "hello2" > hello2.txt
$ tar -cvzf hello.tgz hello[12].txt
hello1.txt
hello2.txt
$ rm hello[12].txt
$ ls hello*
hello.tgz
$ tar -xvzf hello.tgz -O
hello1.txt
hello1
hello2.txt
hello2
$ ls hello*
hello.tgz
$ tar -xvzf hello.tgz
hello1.txt
hello2.txt
$ ls hello*
hello1.txt  hello2.txt	hello.tgz
$ rm hello*

Solution 4 - Gzip

You can also check contents of *.tag.gz file using pigz (parallel gzip) to speedup the archive check:

pigz -cvdp number_of_threads /[...]path[...]/archive_name.tar.gz | tar -tv > /dev/null

Solution 5 - Gzip

A nice option is to use tar -tvvf <filePath> which adds a line that reports the kind of file.

Example in a valid .tar file:

> tar -tvvf filename.tar 
drwxr-xr-x  0 diegoreymendez staff       0 Jul 31 12:46 ./testfolder2/
-rw-r--r--  0 diegoreymendez staff      82 Jul 31 12:46 ./testfolder2/._.DS_Store
-rw-r--r--  0 diegoreymendez staff    6148 Jul 31 12:46 ./testfolder2/.DS_Store
drwxr-xr-x  0 diegoreymendez staff       0 Jul 31 12:42 ./testfolder2/testfolder/
-rw-r--r--  0 diegoreymendez staff      82 Jul 31 12:42 ./testfolder2/testfolder/._.DS_Store
-rw-r--r--  0 diegoreymendez staff    6148 Jul 31 12:42 ./testfolder2/testfolder/.DS_Store
-rw-r--r--  0 diegoreymendez staff  325377 Jul  5 09:50 ./testfolder2/testfolder/Scala.pages
Archive Format: POSIX ustar format,  Compression: none

Corrupted .tar file:

> tar -tvvf corrupted.tar 
tar: Unrecognized archive format
Archive Format: (null),  Compression: none
tar: Error exit delayed from previous errors.

Solution 6 - Gzip

I have tried the following command and they work well.

bzip2 -t file.bz2
gunzip -t file.gz

However, we can found these two command are time-consuming. Maybe we need some more quick way to determine the intact of the compress files.

Solution 7 - Gzip

These are all very sub-optimal solutions. From the GZIP spec >ID2 (IDentification 2)
These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139 (0x8b, \213), to identify the file as being in gzip format.

Has to be coded into whatever language you're using.

Solution 8 - Gzip

> > use the -O option. [...] If the tar file is corrupt, the process will abort with an error.

Sometimes yes, but sometimes not. Let's see an example of a corrupted file:

echo Pete > my_name
tar -cf my_data.tar my_name 

# // Simulate a corruption
sed < my_data.tar 's/Pete/Fool/' > my_data_now.tar
# // "my_data_now.tar" is the corrupted file

tar -xvf my_data_now.tar -O

It shows:

my_name
Fool  

Even if you execute

echo $?

tar said that there was no error:

0

but the file was corrupted, it has now "Fool" instead of "Pete".

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionunderstackView Question on Stackoverflow
Solution 1 - GzipRob WellsView Answer on Stackoverflow
Solution 2 - GzipJohn BokerView Answer on Stackoverflow
Solution 3 - GzipOssie MooreView Answer on Stackoverflow
Solution 4 - GzipeinView Answer on Stackoverflow
Solution 5 - GzipdiegoreymendezView Answer on Stackoverflow
Solution 6 - GzipShicheng GuoView Answer on Stackoverflow
Solution 7 - GzipKevin LiuView Answer on Stackoverflow
Solution 8 - GzipSysView Answer on Stackoverflow