Counting Lines in Large Data Files in Java
Counting the number of lines in massive data files can be a daunting task. While iterating through the file line by line is a common approach, it is time-consuming and inefficient.
A more efficient alternative is to utilize the following optimized method:
public static int countLines(String filename) throws IOException { InputStream is = new BufferedInputStream(new FileInputStream(filename)); try { byte[] c = new byte[1024]; int count = 0; int readChars = 0; boolean empty = true; while ((readChars = is.read(c)) != -1) { empty = false; for (int i = 0; i < readChars; ++i) { if (c[i] == '\n') { ++count; } } } return (count == 0 && !empty) ? 1 : count; } finally { is.close(); } } public static int countLinesNew(String filename) throws IOException { InputStream is = new BufferedInputStream(new FileInputStream(filename)); try { byte[] c = new byte[1024]; int readChars = is.read(c); if (readChars == -1) { // bail out if nothing to read return 0; } // make it easy for the optimizer to tune this loop int count = 0; while (readChars == 1024) { for (int i = 0; i < 1024;) { if (c[i++] == '\n') { ++count; } } readChars = is.read(c); } // count remaining characters while (readChars != -1) { for (int i = 0; i < readChars; ++i) { if (c[i] == '\n') { ++count; } } readChars = is.read(c); } return count == 0 ? 1 : count; } finally { is.close(); } }
This method reads the file in chunks of 1024 bytes, significantly reducing the number of file system accesses compared to reading line by line. It maintains a count of lines encountered during each chunk and accumulates the total count.
Benchmarks have shown that this method is significantly faster than using LineNumberReader. For a 1.3GB text file, the optimized method takes around 0.35 seconds to count the lines, while LineNumberReader takes approximately 2.40 seconds.
The above is the detailed content of How Can I Efficiently Count Lines in Large Java Data Files?. For more information, please follow other related articles on the PHP Chinese website!