How do I extract a tar file in Java?

JavaArchiveTar

Java Problem Overview


How do I extract a tar (or tar.gz, or tar.bz2) file in Java?

Java Solutions


Solution 1 - Java

You can do this with the Apache Commons Compress library. You can download the 1.2 version from http://mvnrepository.com/artifact/org.apache.commons/commons-compress/1.2.

Here are two methods: one that unzips a file and another one that untars it. So, for a file <fileName>tar.gz, you need to first unzip it and after that untar it. Please note that the tar archive may contain folders as well, case in which they need to be created on the local filesystem.

Enjoy.

/** Untar an input file into an output file.

 * The output file is created in the output folder, having the same name
 * as the input file, minus the '.tar' extension. 
 * 
 * @param inputFile		the input .tar file
 * @param outputDir 	the output directory file. 
 * @throws IOException 
 * @throws FileNotFoundException
 *  
 * @return	The {@link List} of {@link File}s with the untared content.
 * @throws ArchiveException 
 */
private static List<File> unTar(final File inputFile, final File outputDir) throws FileNotFoundException, IOException, ArchiveException {
	
	LOG.info(String.format("Untaring %s to dir %s.", inputFile.getAbsolutePath(), outputDir.getAbsolutePath()));

	final List<File> untaredFiles = new LinkedList<File>();
	final InputStream is = new FileInputStream(inputFile); 
	final TarArchiveInputStream debInputStream = (TarArchiveInputStream) new ArchiveStreamFactory().createArchiveInputStream("tar", is);
	TarArchiveEntry entry = null; 
	while ((entry = (TarArchiveEntry)debInputStream.getNextEntry()) != null) {
		final File outputFile = new File(outputDir, entry.getName());
		if (entry.isDirectory()) {
			LOG.info(String.format("Attempting to write output directory %s.", outputFile.getAbsolutePath()));
			if (!outputFile.exists()) {
				LOG.info(String.format("Attempting to create output directory %s.", outputFile.getAbsolutePath()));
				if (!outputFile.mkdirs()) {
					throw new IllegalStateException(String.format("Couldn't create directory %s.", outputFile.getAbsolutePath()));
				}
			}
		} else {
			LOG.info(String.format("Creating output file %s.", outputFile.getAbsolutePath()));
			final OutputStream outputFileStream = new FileOutputStream(outputFile); 
			IOUtils.copy(debInputStream, outputFileStream);
			outputFileStream.close();
		}
		untaredFiles.add(outputFile);
	}
	debInputStream.close(); 
    
	return untaredFiles;
}

/**
 * Ungzip an input file into an output file.
 * <p>
 * The output file is created in the output folder, having the same name
 * as the input file, minus the '.gz' extension. 
 * 
 * @param inputFile		the input .gz file
 * @param outputDir 	the output directory file. 
 * @throws IOException 
 * @throws FileNotFoundException
 *  
 * @return	The {@File} with the ungzipped content.
 */
private static File unGzip(final File inputFile, final File outputDir) throws FileNotFoundException, IOException {
	
	LOG.info(String.format("Ungzipping %s to dir %s.", inputFile.getAbsolutePath(), outputDir.getAbsolutePath()));

	final File outputFile = new File(outputDir, inputFile.getName().substring(0, inputFile.getName().length() - 3));
	
	final GZIPInputStream in = new GZIPInputStream(new FileInputStream(inputFile));
	final FileOutputStream out = new FileOutputStream(outputFile);
	
    IOUtils.copy(in, out);
    
    in.close();
    out.close();
    
	return outputFile;
}

Solution 2 - Java

Note: This functionality was later published through a separate project, Apache Commons Compress, as described in another answer. This answer is out of date.


I haven't used a tar API directly, but tar and bzip2 are implemented in Ant; you could borrow their implementation, or possibly use Ant to do what you need.

Gzip is part of Java SE (and I'm guessing the Ant implementation follows the same model).

GZIPInputStream is just an InputStream decorator. You can wrap, for example, a FileInputStream in a GZIPInputStream and use it in the same way you'd use any InputStream:

InputStream is = new GZIPInputStream(new FileInputStream(file));

(Note that the GZIPInputStream has its own, internal buffer, so wrapping the FileInputStream in a BufferedInputStream would probably decrease performance.)

Solution 3 - Java

Archiver archiver = ArchiverFactory.createArchiver("tar", "gz");
archiver.extract(archiveFile, destDir);

Dependency:

 <dependency>
        <groupId>org.rauschig</groupId>
        <artifactId>jarchivelib</artifactId>
        <version>0.5.0</version>
</dependency>

Solution 4 - Java

Apache Commons VFS supports tar as a virtual file system, which supports URLs like this one tar:gz:http://anyhost/dir/mytar.tar.gz!/mytar.tar!/path/in/tar/README.txt</code><br>

TrueZip or its successor TrueVFS does the same ... it's also available from Maven Central.

Solution 5 - Java

I just tried a bunch of the suggested libs (TrueZip, Apache Compress), but no luck.

Here is an example with Apache Commons VFS:

FileSystemManager fsManager = VFS.getManager();
FileObject archive = fsManager.resolveFile("tgz:file://" + fileName);

// List the children of the archive file
FileObject[] children = archive.getChildren();
System.out.println("Children of " + archive.getName().getURI()+" are ");
for (int i = 0; i < children.length; i++) {
    FileObject fo = children[i];
    System.out.println(fo.getName().getBaseName());
    if (fo.isReadable() && fo.getType() == FileType.FILE
        && fo.getName().getExtension().equals("nxml")) {
        FileContent fc = fo.getContent();
        InputStream is = fc.getInputStream();
    }
}

And the maven dependency:

	<dependency>
      <groupId>commons-vfs</groupId>
      <artifactId>commons-vfs</artifactId>
      <version>1.0</version>
    </dependency>

Solution 6 - Java

In addition to gzip and bzip2, Apache Commons Compress API has also tar support, originally based on ICE Engineering Java Tar Package, which is both API and standalone tool.

Solution 7 - Java

What about using this API for tar files, this other one included inside Ant for BZIP2 and the standard one for GZIP?

Solution 8 - Java

Here's a version based on this earlier answer by Dan Borza that uses Apache Commons Compress and Java NIO (i.e. Path instead of File). It also does the uncompression and untarring in one stream so there's no intermediate file creation.

public static void unTarGz( Path pathInput, Path pathOutput ) throws IOException {
    TarArchiveInputStream tararchiveinputstream =
        new TarArchiveInputStream(
            new GzipCompressorInputStream(
                new BufferedInputStream( Files.newInputStream( pathInput ) ) ) );

    ArchiveEntry archiveentry = null;
    while( (archiveentry = tararchiveinputstream.getNextEntry()) != null ) {
        Path pathEntryOutput = pathOutput.resolve( archiveentry.getName() );
        if( archiveentry.isDirectory() ) {
            if( !Files.exists( pathEntryOutput ) )
                Files.createDirectory( pathEntryOutput );
        }
        else
            Files.copy( tararchiveinputstream, pathEntryOutput );
    }

    tararchiveinputstream.close();
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionskiphoppyView Question on Stackoverflow
Solution 1 - JavaDan BorzaView Answer on Stackoverflow
Solution 2 - JavaericksonView Answer on Stackoverflow
Solution 3 - JavaD3ivView Answer on Stackoverflow
Solution 4 - JavaJörgView Answer on Stackoverflow
Solution 5 - JavaRenaudView Answer on Stackoverflow
Solution 6 - JavaJörgView Answer on Stackoverflow
Solution 7 - JavaFernando MiguélezView Answer on Stackoverflow
Solution 8 - JavaWade WalkerView Answer on Stackoverflow