How can I get the count of line in a file in an efficient way?
JavaFileJava Problem Overview
I have a big file. It includes approximately 3.000-20.000 lines. How can I get the total count of lines in the file using Java?
Java Solutions
Solution 1 - Java
BufferedReader reader = new BufferedReader(new FileReader("file.txt"));
int lines = 0;
while (reader.readLine() != null) lines++;
reader.close();
Update: To answer the performance-question raised here, I made a measurement. First thing: 20.000 lines are too few, to get the program running for a noticeable time. I created a text-file with 5 million lines. This solution (started with java without parameters like -server or -XX-options) needed around 11 seconds on my box. The same with wc -l
(UNIX command-line-tool to count lines), 11 seconds. The solution reading every single character and looking for '\n' needed 104 seconds, 9-10 times as much.
Solution 2 - Java
Files.lines
Java 8+ has a nice and short way using NIO using Files.lines
. Note that you have to close the stream using try-with-resources:
long lineCount;
try (Stream<String> stream = Files.lines(path, StandardCharsets.UTF_8)) {
lineCount = stream.count();
}
If you don't specify the character encoding, the default one used is UTF-8. You may specify an alternate encoding to match your particular data file as shown in the example above.
Solution 3 - Java
use LineNumberReader
something like
public static int countLines(File aFile) throws IOException {
LineNumberReader reader = null;
try {
reader = new LineNumberReader(new FileReader(aFile));
while ((reader.readLine()) != null);
return reader.getLineNumber();
} catch (Exception ex) {
return -1;
} finally {
if(reader != null)
reader.close();
}
}
Solution 4 - Java
I found some solution for this, it might useful for you
Below is the code snippet for, count the no.of lines from the file.
File file = new File("/mnt/sdcard/abc.txt");
LineNumberReader lineNumberReader = new LineNumberReader(new FileReader(file));
lineNumberReader.skip(Long.MAX_VALUE);
int lines = lineNumberReader.getLineNumber();
lineNumberReader.close();
Solution 5 - Java
Read the file through and count the number of newline characters. An easy way to read a file in Java, one line at a time, is the java.util.Scanner class.
Solution 6 - Java
This is about as efficient as it can get, buffered binary read, no string conversion,
FileInputStream stream = new FileInputStream("/tmp/test.txt");
byte[] buffer = new byte[8192];
int count = 0;
int n;
while ((n = stream.read(buffer)) > 0) {
for (int i = 0; i < n; i++) {
if (buffer[i] == '\n') count++;
}
}
stream.close();
System.out.println("Number of lines: " + count);
Solution 7 - Java
Do You need exact number of lines or only its approximation? I happen to process large files in parallel and often I don't need to know exact count of lines - I then revert to sampling. Split the file into ten 1MB chunks and count lines in each chunk, then multiply it by 10 and You'll receive pretty good approximation of line count.
Solution 8 - Java
All previous answers suggest to read though the whole file and count the amount of newlines you find while doing this. You commented some as "not effective" but thats the only way you can do that. A "line" is nothing else as a simple character inside the file. And to count that character you must have a look at every single character within the file.
I'm sorry, but you have no choice. :-)
Solution 9 - Java
This solution is about 3.6× faster than the top rated answer when tested on a file with 13.8 million lines. It simply reads the bytes into a buffer and counts the \n
characters. You could play with the buffer size, but on my machine, anything above 8KB didn't make the code faster.
private int countLines(File file) throws IOException {
int lines = 0;
FileInputStream fis = new FileInputStream(file);
byte[] buffer = new byte[BUFFER_SIZE]; // BUFFER_SIZE = 8 * 1024
int read;
while ((read = fis.read(buffer)) != -1) {
for (int i = 0; i < read; i++) {
if (buffer[i] == '\n') lines++;
}
}
fis.close();
return lines;
}
Solution 10 - Java
If the already posted answers aren't fast enough you'll probably have to look for a solution specific to your particular problem.
For example if these text files are logs that are only appended to and you regularly need to know the number of lines in them you could create an index. This index would contain the number of lines in the file, when the file was last modified and how large the file was then. This would allow you to recalculate the number of lines in the file by skipping over all the lines you had already seen and just reading the new lines.
Solution 11 - Java
Old post, but I have a solution that could be usefull for next people. Why not just use file length to know what is the progression? Of course, lines has to be almost the same size, but it works very well for big files:
public static void main(String[] args) throws IOException {
File file = new File("yourfilehere");
double fileSize = file.length();
System.out.println("=======> File size = " + fileSize);
InputStream inputStream = new FileInputStream(file);
InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "iso-8859-1");
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
int totalRead = 0;
try {
while (bufferedReader.ready()) {
String line = bufferedReader.readLine();
// LINE PROCESSING HERE
totalRead += line.length() + 1; // we add +1 byte for the newline char.
System.out.println("Progress ===> " + ((totalRead / fileSize) * 100) + " %");
}
} finally {
bufferedReader.close();
}
}
It allows to see the progression without doing any full read on the file. I know it depends on lot of elements, but I hope it will be usefull :).
[Edition] Here is a version with estimated time. I put some SYSO to show progress and estimation. I see that you have a good time estimation errors after you have treated enough line (I try with 10M lines, and after 1% of the treatment, the time estimation was exact at 95%). I know, some values has to be set in variable. This code is quickly written but has be usefull for me. Hope it will be for you too :).
long startProcessLine = System.currentTimeMillis();
int totalRead = 0;
long progressTime = 0;
double percent = 0;
int i = 0;
int j = 0;
int fullEstimation = 0;
try {
while (bufferedReader.ready()) {
String line = bufferedReader.readLine();
totalRead += line.length() + 1;
progressTime = System.currentTimeMillis() - startProcessLine;
percent = (double) totalRead / fileSize * 100;
if ((percent > 1) && i % 10000 == 0) {
int estimation = (int) ((progressTime / percent) * (100 - percent));
fullEstimation += progressTime + estimation;
j++;
System.out.print("Progress ===> " + percent + " %");
System.out.print(" - current progress : " + (progressTime) + " milliseconds");
System.out.print(" - Will be finished in ===> " + estimation + " milliseconds");
System.out.println(" - estimated full time => " + (progressTime + estimation));
}
i++;
}
} finally {
bufferedReader.close();
}
System.out.println("Ended in " + (progressTime) + " seconds");
System.out.println("Estimative average ===> " + (fullEstimation / j));
System.out.println("Difference: " + ((((double) 100 / (double) progressTime)) * (progressTime - (fullEstimation / j))) + "%");
Feel free to improve this code if you think it's a good solution.
Solution 12 - Java
Quick and dirty, but it does the job:
import java.io.*;
public class Counter {
public final static void main(String[] args) throws IOException {
if (args.length > 0) {
File file = new File(args[0]);
System.out.println(countLines(file));
}
}
public final static int countLines(File file) throws IOException {
ProcessBuilder builder = new ProcessBuilder("wc", "-l", file.getAbsolutePath());
Process process = builder.start();
InputStream in = process.getInputStream();
LineNumberReader reader = new LineNumberReader(new InputStreamReader(in));
String line = reader.readLine();
if (line != null) {
return Integer.parseInt(line.trim().split(" ")[0]);
} else {
return -1;
}
}
}
Solution 13 - Java
Read the file line by line and increment a counter for each line until you have read the entire file.
Solution 14 - Java
Try the unix "wc" command. I don't mean use it, I mean download the source and see how they do it. It's probably in c, but you can easily port the behavior to java. The problem with making your own is to account for the ending cr/lf problem.
Solution 15 - Java
The buffered reader is overkill
Reader r = new FileReader("f.txt");
int count = 0;
int nextchar = 0;
while (nextchar != -1){
nextchar = r.read();
if (nextchar == Character.getNumericValue('\n') ){
count++;
}
}
My search for a simple example has createde one thats actually quite poor. calling read() repeadedly for a single character is less than optimal. see here for examples and measurements.