How can I get the count of line in a file in an efficient way?

JavaFile

Java Problem Overview


I have a big file. It includes approximately 3.000-20.000 lines. How can I get the total count of lines in the file using Java?

Java Solutions


Solution 1 - Java

BufferedReader reader = new BufferedReader(new FileReader("file.txt"));
int lines = 0;
while (reader.readLine() != null) lines++;
reader.close();

Update: To answer the performance-question raised here, I made a measurement. First thing: 20.000 lines are too few, to get the program running for a noticeable time. I created a text-file with 5 million lines. This solution (started with java without parameters like -server or -XX-options) needed around 11 seconds on my box. The same with wc -l (UNIX command-line-tool to count lines), 11 seconds. The solution reading every single character and looking for '\n' needed 104 seconds, 9-10 times as much.

Solution 2 - Java

Files.lines

Java 8+ has a nice and short way using NIO using Files.lines. Note that you have to close the stream using try-with-resources:

long lineCount;
try (Stream<String> stream = Files.lines(path, StandardCharsets.UTF_8)) {
  lineCount = stream.count();
}

If you don't specify the character encoding, the default one used is UTF-8. You may specify an alternate encoding to match your particular data file as shown in the example above.

Solution 3 - Java

use LineNumberReader

something like

public static int countLines(File aFile) throws IOException {
    LineNumberReader reader = null;
    try {
        reader = new LineNumberReader(new FileReader(aFile));
        while ((reader.readLine()) != null);
        return reader.getLineNumber();
    } catch (Exception ex) {
        return -1;
    } finally { 
        if(reader != null) 
            reader.close();
    }
}

Solution 4 - Java

I found some solution for this, it might useful for you

Below is the code snippet for, count the no.of lines from the file.

  File file = new File("/mnt/sdcard/abc.txt");
  LineNumberReader lineNumberReader = new LineNumberReader(new FileReader(file));
  lineNumberReader.skip(Long.MAX_VALUE);
  int lines = lineNumberReader.getLineNumber();
  lineNumberReader.close();

Solution 5 - Java

Read the file through and count the number of newline characters. An easy way to read a file in Java, one line at a time, is the java.util.Scanner class.

Solution 6 - Java

This is about as efficient as it can get, buffered binary read, no string conversion,

FileInputStream stream = new FileInputStream("/tmp/test.txt");
byte[] buffer = new byte[8192];
int count = 0;
int n;
while ((n = stream.read(buffer)) > 0) {
	for (int i = 0; i < n; i++) {
		if (buffer[i] == '\n') count++;
	}
}
stream.close();
System.out.println("Number of lines: " + count);

Solution 7 - Java

Do You need exact number of lines or only its approximation? I happen to process large files in parallel and often I don't need to know exact count of lines - I then revert to sampling. Split the file into ten 1MB chunks and count lines in each chunk, then multiply it by 10 and You'll receive pretty good approximation of line count.

Solution 8 - Java

All previous answers suggest to read though the whole file and count the amount of newlines you find while doing this. You commented some as "not effective" but thats the only way you can do that. A "line" is nothing else as a simple character inside the file. And to count that character you must have a look at every single character within the file.

I'm sorry, but you have no choice. :-)

Solution 9 - Java

This solution is about 3.6× faster than the top rated answer when tested on a file with 13.8 million lines. It simply reads the bytes into a buffer and counts the \n characters. You could play with the buffer size, but on my machine, anything above 8KB didn't make the code faster.

private int countLines(File file) throws IOException {
	int lines = 0;
	
	FileInputStream fis = new FileInputStream(file);
	byte[] buffer = new byte[BUFFER_SIZE]; // BUFFER_SIZE = 8 * 1024
	int read;
	
	while ((read = fis.read(buffer)) != -1) {
		for (int i = 0; i < read; i++) {
			if (buffer[i] == '\n') lines++;
		}
	}
	
	fis.close();
	
	return lines;
}

Solution 10 - Java

If the already posted answers aren't fast enough you'll probably have to look for a solution specific to your particular problem.

For example if these text files are logs that are only appended to and you regularly need to know the number of lines in them you could create an index. This index would contain the number of lines in the file, when the file was last modified and how large the file was then. This would allow you to recalculate the number of lines in the file by skipping over all the lines you had already seen and just reading the new lines.

Solution 11 - Java

Old post, but I have a solution that could be usefull for next people. Why not just use file length to know what is the progression? Of course, lines has to be almost the same size, but it works very well for big files:

public static void main(String[] args) throws IOException {
    File file = new File("yourfilehere");
    double fileSize = file.length();
    System.out.println("=======> File size = " + fileSize);
    InputStream inputStream = new FileInputStream(file);
    InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "iso-8859-1");
    BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
    int totalRead = 0;
    try {
        while (bufferedReader.ready()) {
            String line = bufferedReader.readLine();
            // LINE PROCESSING HERE
            totalRead += line.length() + 1; // we add +1 byte for the newline char.
            System.out.println("Progress ===> " + ((totalRead / fileSize) * 100) + " %");
        }
    } finally {
        bufferedReader.close();
    }
}

It allows to see the progression without doing any full read on the file. I know it depends on lot of elements, but I hope it will be usefull :).

[Edition] Here is a version with estimated time. I put some SYSO to show progress and estimation. I see that you have a good time estimation errors after you have treated enough line (I try with 10M lines, and after 1% of the treatment, the time estimation was exact at 95%). I know, some values has to be set in variable. This code is quickly written but has be usefull for me. Hope it will be for you too :).

long startProcessLine = System.currentTimeMillis();
    int totalRead = 0;
    long progressTime = 0;
    double percent = 0;
    int i = 0;
    int j = 0;
    int fullEstimation = 0;
    try {
        while (bufferedReader.ready()) {
            String line = bufferedReader.readLine();
            totalRead += line.length() + 1;
            progressTime = System.currentTimeMillis() - startProcessLine;
            percent = (double) totalRead / fileSize * 100;
            if ((percent > 1) && i % 10000 == 0) {
                int estimation = (int) ((progressTime / percent) * (100 - percent));
                fullEstimation += progressTime + estimation;
                j++;
                System.out.print("Progress ===> " + percent + " %");
                System.out.print(" - current progress : " + (progressTime) + " milliseconds");
                System.out.print(" - Will be finished in ===> " + estimation + " milliseconds");
                System.out.println(" - estimated full time => " + (progressTime + estimation));
            }
            i++;
        }
    } finally {
        bufferedReader.close();
    }
    System.out.println("Ended in " + (progressTime) + " seconds");
    System.out.println("Estimative average ===> " + (fullEstimation / j));
    System.out.println("Difference: " + ((((double) 100 / (double) progressTime)) * (progressTime - (fullEstimation / j))) + "%");

Feel free to improve this code if you think it's a good solution.

Solution 12 - Java

Quick and dirty, but it does the job:

import java.io.*;

public class Counter {

    public final static void main(String[] args) throws IOException {
        if (args.length > 0) {
            File file = new File(args[0]);
            System.out.println(countLines(file));
        }
    }
    
    public final static int countLines(File file) throws IOException {
        ProcessBuilder builder = new ProcessBuilder("wc", "-l", file.getAbsolutePath());
        Process process = builder.start();
        InputStream in = process.getInputStream();
        LineNumberReader reader = new LineNumberReader(new InputStreamReader(in));
        String line = reader.readLine();
        if (line != null) {
            return Integer.parseInt(line.trim().split(" ")[0]);
        } else {
            return -1;
        }
    }

}

Solution 13 - Java

Read the file line by line and increment a counter for each line until you have read the entire file.

Solution 14 - Java

Try the unix "wc" command. I don't mean use it, I mean download the source and see how they do it. It's probably in c, but you can easily port the behavior to java. The problem with making your own is to account for the ending cr/lf problem.

Solution 15 - Java

The buffered reader is overkill

Reader r = new FileReader("f.txt");
	
int count = 0;
int nextchar = 0;
while (nextchar != -1){
        nextchar = r.read();
		if (nextchar == Character.getNumericValue('\n') ){
			count++;
		}
	}

My search for a simple example has createde one thats actually quite poor. calling read() repeadedly for a single character is less than optimal. see here for examples and measurements.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionfirstthumbView Question on Stackoverflow
Solution 1 - JavaMnementhView Answer on Stackoverflow
Solution 2 - JavaAugustinView Answer on Stackoverflow
Solution 3 - JavaNarayanView Answer on Stackoverflow
Solution 4 - JavabrigView Answer on Stackoverflow
Solution 5 - JavaEsko LuontolaView Answer on Stackoverflow
Solution 6 - JavaZZ CoderView Answer on Stackoverflow
Solution 7 - JavamattView Answer on Stackoverflow
Solution 8 - JavaMalaxView Answer on Stackoverflow
Solution 9 - JavafhuchoView Answer on Stackoverflow
Solution 10 - JavablackNBUKView Answer on Stackoverflow
Solution 11 - JavalpratlongView Answer on Stackoverflow
Solution 12 - JavaWilfred SpringerView Answer on Stackoverflow
Solution 13 - JavaKen LiuView Answer on Stackoverflow
Solution 14 - JavaDanielView Answer on Stackoverflow
Solution 15 - JavaNSherwinView Answer on Stackoverflow