How to check if InputStream is Gzipped?

JavaHttpGzipInputstreamHttpurlconnection

Java Problem Overview


Is there any way to check if InputStream has been gzipped? Here's the code:

public static InputStream decompressStream(InputStream input) {
	try {
		GZIPInputStream gs = new GZIPInputStream(input);
		return gs;
	} catch (IOException e) {
		logger.info("Input stream not in the GZIP format, using standard format");
		return input;
	}
}

I tried this way but it doesn't work as expected - values read from the stream are invalid. EDIT: Added the method I use to compress data:

public static byte[] compress(byte[] content) {
	ByteArrayOutputStream baos = new ByteArrayOutputStream();
	try {
		GZIPOutputStream gs = new GZIPOutputStream(baos);
		gs.write(content);
		gs.close();
	} catch (IOException e) {
		logger.error("Fatal error occured while compressing data");
		throw new RuntimeException(e);
	}
	double ratio = (1.0f * content.length / baos.size());
	if (ratio > 1) {
		logger.info("Compression ratio equals " + ratio);
		return baos.toByteArray();
	}
	logger.info("Compression not needed");
	return content;

}

Java Solutions


Solution 1 - Java

It's not foolproof but it's probably the easiest and doesn't rely on any external data. Like all decent formats, GZip too begins with a magic number which can be quickly checked without reading the entire stream.

public static InputStream decompressStream(InputStream input) {
     PushbackInputStream pb = new PushbackInputStream( input, 2 ); //we need a pushbackstream to look ahead
     byte [] signature = new byte[2];
     int len = pb.read( signature ); //read the signature
     pb.unread( signature, 0, len ); //push back the signature to the stream
     if( signature[ 0 ] == (byte) 0x1f && signature[ 1 ] == (byte) 0x8b ) //check if matches standard gzip magic number
       return new GZIPInputStream( pb );
     else 
       return pb;
}

(Source for the magic number: GZip file format specification)

Update: I've just dicovered that there is also a constant called GZIP_MAGIC in GZipInputStream which contains this value, so if you really want to, you can use the lower two bytes of it.

Solution 2 - Java

> The InputStream comes from HttpURLConnection#getInputStream()

In that case you need to check if HTTP Content-Encoding response header equals to gzip.

URLConnection connection = url.openConnection();
InputStream input = connection.getInputStream();

if ("gzip".equals(connection.getContentEncoding())) {
    input = new GZIPInputStream(input);
}

// ...

This all is clearly specified in HTTP spec.


Update: as per the way how you compressed the source of the stream: this ratio check is pretty... insane. Get rid of it. The same length does not necessarily mean that the bytes are the same. Let it always return the gzipped stream so that you can always expect a gzipped stream and just apply GZIPInputStream without nasty checks.

Solution 3 - Java

I found this useful example that provides a clean implementation of isCompressed():

/*
 * Determines if a byte array is compressed. The java.util.zip GZip
 * implementation does not expose the GZip header so it is difficult to determine
 * if a string is compressed.
 * 
 * @param bytes an array of bytes
 * @return true if the array is compressed or false otherwise
 * @throws java.io.IOException if the byte array couldn't be read
 */
 public boolean isCompressed(byte[] bytes)
 {
      if ((bytes == null) || (bytes.length < 2))
  	  {
           return false;
      }
      else
  	  {
            return ((bytes[0] == (byte) (GZIPInputStream.GZIP_MAGIC)) && (bytes[1] == (byte) (GZIPInputStream.GZIP_MAGIC >> 8)));
      }
 }

I tested it with success:

@Test
public void testIsCompressed() {
    assertFalse(util.isCompressed(originalBytes));
    assertTrue(util.isCompressed(compressed));
}

Solution 4 - Java

I believe this is simpliest way to check whether a byte array is gzip formatted or not, it does not depend on any HTTP entity or mime type support

public static boolean isGzipStream(byte[] bytes) {
	  int head = ((int) bytes[0] & 0xff) | ((bytes[1] << 8) & 0xff00);
	  return (GZIPInputStream.GZIP_MAGIC == head);
}

Solution 5 - Java

Building on the answer by @biziclop - this version uses the GZIP_MAGIC header and additionally is safe for empty or single byte data streams.

public static InputStream maybeDecompress(InputStream input) {
    final PushbackInputStream pb = new PushbackInputStream(input, 2);
    
    int header = pb.read();
    if(header == -1) {
        return pb;
    }
    
    int b = pb.read();
    if(b == -1) {
        pb.unread(header);
        return pb;
    }
    
    pb.unread(new byte[]{(byte)header, (byte)b});
    
    header = (b << 8) | header;
    
    if(header == GZIPInputStream.GZIP_MAGIC) {
        return new GZIPInputStream(pb);
    } else {
        return pb;
    }
}

Solution 6 - Java

This function works perfectly well in Java:

public static boolean isGZipped(File f) {   
    val raf = new RandomAccessFile(file, "r")
    return GZIPInputStream.GZIP_MAGIC == (raf.read() & 0xff | ((raf.read() << 8) & 0xff00))
}

In scala:

def isGZip(file:File): Boolean = {
   int gzip = 0
   RandomAccessFile raf = new RandomAccessFile(f, "r")
   gzip = raf.read() & 0xff | ((raf.read() << 8) & 0xff00)
   raf.close()
   return gzip == GZIPInputStream.GZIP_MAGIC
}

Solution 7 - Java

Wrap the original stream in a BufferedInputStream, then wrap that in a GZipInputStream. Next try to extract a ZipEntry. If this works, it's a zip file. Then you can use "mark" and "reset" in the BufferedInputStream to return to the initial position in the stream, after your check.

Solution 8 - Java

Not exactly what you are asking but could be an alternative approach if you are using HttpClient:

private static InputStream getInputStream(HttpEntity entity) throws IOException {
  Header encoding = entity.getContentEncoding(); 
  if (encoding != null) {
     if (encoding.getValue().equals("gzip") || encoding.getValue().equals("zip") ||      encoding.getValue().equals("application/x-gzip-compressed")) {
        return new GZIPInputStream(entity.getContent());
     }
  }
  return entity.getContent();
}

Solution 9 - Java

SimpleMagic is a Java library for resolving content types:

<!-- pom.xml -->
    <dependency>
        <groupId>com.j256.simplemagic</groupId>
        <artifactId>simplemagic</artifactId>
        <version>1.8</version>
    </dependency>

import com.j256.simplemagic.ContentInfo;
import com.j256.simplemagic.ContentInfoUtil;
import com.j256.simplemagic.ContentType;
// ...

public class SimpleMagicSmokeTest {

    private final static Logger log = LoggerFactory.getLogger(SimpleMagicSmokeTest.class);

    @Test
    public void smokeTestSimpleMagic() throws IOException {
        ContentInfoUtil util = new ContentInfoUtil();
        InputStream possibleGzipInputStream = getGzipInputStream();
        ContentInfo info = util.findMatch(possibleGzipInputStream);
    
        log.info( info.toString() );
        assertEquals( ContentType.GZIP, info.getContentType() );
    }

Solution 10 - Java

This is how to read a file that CAN BE gzipped:

private void read(final File file)
        throws IOException {
    InputStream stream = null;
    try (final InputStream inputStream = new FileInputStream(file);
            final BufferedInputStream bInputStream = new BufferedInputStream(inputStream);) {
        bInputStream.mark(1024);
        try {
            stream = new GZIPInputStream(bInputStream);
        } catch (final ZipException e) {
            // not gzipped OR not supported zip format
            bInputStream.reset();
            stream = bInputStream;
        }
        // USE STREAM HERE
    } finally {
        if (stream != null) {
            stream.close();
        }
    }
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionvooView Question on Stackoverflow
Solution 1 - JavabiziclopView Answer on Stackoverflow
Solution 2 - JavaBalusCView Answer on Stackoverflow
Solution 3 - JavaAaron RollerView Answer on Stackoverflow
Solution 4 - JavaOconnellView Answer on Stackoverflow
Solution 5 - JavablueView Answer on Stackoverflow
Solution 6 - JavaypriverolView Answer on Stackoverflow
Solution 7 - JavaAmir AfghaniView Answer on Stackoverflow
Solution 8 - JavaRichard HView Answer on Stackoverflow
Solution 9 - JavaAbdullView Answer on Stackoverflow
Solution 10 - JavaTekTimmyView Answer on Stackoverflow