GZIPInputStream reading line by line

JavaFile IoFilereaderGzipinputstream

Java Problem Overview


I have a file in .gz format. The java class for reading this file is GZIPInputStream. However, this class doesn't extend the BufferedReader class of java. As a result, I am not able to read the file line by line. I need something like this

reader  = new MyGZInputStream( some constructor of GZInputStream) 
reader.readLine()...

I though of creating my class which extends the Reader or BufferedReader class of java and use GZIPInputStream as one of its variable.

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.Reader;
import java.util.zip.GZIPInputStream;

public class MyGZFilReader extends Reader {

	private GZIPInputStream gzipInputStream = null;
	char[] buf = new char[1024];
	                      
	@Override
	public void close() throws IOException {
		gzipInputStream.close();
	}

	public MyGZFilReader(String filename)
               throws FileNotFoundException, IOException {
        gzipInputStream = new GZIPInputStream(new FileInputStream(filename));
	}

	@Override
	public int read(char[] cbuf, int off, int len) throws IOException {
		// TODO Auto-generated method stub
		return gzipInputStream.read((byte[])buf, off, len);
	}

}

But, this doesn't work when I use

BufferedReader in = new BufferedReader(
    new MyGZFilReader("F:/gawiki-20090614-stub-meta-history.xml.gz"));
System.out.println(in.readLine());

Can someone advice how to proceed ..

Java Solutions


Solution 1 - Java

The basic setup of decorators is like this:

InputStream fileStream = new FileInputStream(filename);
InputStream gzipStream = new GZIPInputStream(fileStream);
Reader decoder = new InputStreamReader(gzipStream, encoding);
BufferedReader buffered = new BufferedReader(decoder);

The key issue in this snippet is the value of encoding. This is the character encoding of the text in the file. Is it "US-ASCII", "UTF-8", "SHIFT-JIS", "ISO-8859-9", …? there are hundreds of possibilities, and the correct choice usually cannot be determined from the file itself. It must be specified through some out-of-band channel.

For example, maybe it's the platform default. In a networked environment, however, this is extremely fragile. The machine that wrote the file might sit in the neighboring cubicle, but have a different default file encoding.

Most network protocols use a header or other metadata to explicitly note the character encoding.

In this case, it appears from the file extension that the content is XML. XML includes the "encoding" attribute in the XML declaration for this purpose. Furthermore, XML should really be processed with an XML parser, not as text. Reading XML line-by-line seems like a fragile, special case.

Failing to explicitly specify the encoding is against the second commandment. Use the default encoding at your peril!

Solution 2 - Java

GZIPInputStream gzip = new GZIPInputStream(new FileInputStream("F:/gawiki-20090614-stub-meta-history.xml.gz"));
BufferedReader br = new BufferedReader(new InputStreamReader(gzip));
br.readLine();

Solution 3 - Java

BufferedReader in = new BufferedReader(new InputStreamReader(
        new GZIPInputStream(new FileInputStream("F:/gawiki-20090614-stub-meta-history.xml.gz"))));

String content;

while ((content = in.readLine()) != null)

   System.out.println(content);

Solution 4 - Java

You can use the following method in a util class, and use it whenever necessary...

public static List<String> readLinesFromGZ(String filePath) {
	List<String> lines = new ArrayList<>();
	File file = new File(filePath);

	try (GZIPInputStream gzip = new GZIPInputStream(new FileInputStream(file));
			BufferedReader br = new BufferedReader(new InputStreamReader(gzip));) {
		String line = null;
		while ((line = br.readLine()) != null) {
			lines.add(line);
		}
	} catch (FileNotFoundException e) {
		e.printStackTrace(System.err);
	} catch (IOException e) {
		e.printStackTrace(System.err);
	}
	return lines;
}

Solution 5 - Java

here is with one line

try (BufferedReader br = new BufferedReader(
        new InputStreamReader(
           new GZIPInputStream(
              new FileInputStream(
                 "F:/gawiki-20090614-stub-meta-history.xml.gz"))))) 
     {br.readLine();}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKapil DView Question on Stackoverflow
Solution 1 - JavaericksonView Answer on Stackoverflow
Solution 2 - JavaChssPly76View Answer on Stackoverflow
Solution 3 - JavaArumugam MathiazhaganView Answer on Stackoverflow
Solution 4 - JavaMeminView Answer on Stackoverflow
Solution 5 - JavaTamerView Answer on Stackoverflow