大きな Java データファイルの行を効率的にカウントするにはどうすればよいですか?-＆＃＆チュートリアル-php.cn

大きな Java データファイルの行を効率的にカウントするにはどうすればよいですか?

Patricia Arquette

リリース： 2024-12-09 09:18:07

オリジナル

375 人が閲覧しました

How Can I Efficiently Count Lines in Large Java Data Files?

Java で大規模なデータファイルの行数を数える

大規模なデータファイルの行数を数えるのは、気の遠くなるような作業になることがあります。ファイルを 1 行ずつ繰り返すのは一般的な方法ですが、時間がかかり非効率的です。

より効率的な代替方法は、次の最適化された方法を利用することです:

public static int countLines(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];
        int count = 0;
        int readChars = 0;
        boolean empty = true;
        while ((readChars = is.read(c)) != -1) {
            empty = false;
            for (int i = 0; i < readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
        }
        return (count == 0 && !empty) ? 1 : count;
    } finally {
        is.close();
    }
}

public static int countLinesNew(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];

        int readChars = is.read(c);
        if (readChars == -1) {
            // bail out if nothing to read
            return 0;
        }

        // make it easy for the optimizer to tune this loop
        int count = 0;
        while (readChars == 1024) {
            for (int i = 0; i < 1024;) {
                if (c[i++] == '\n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }

        // count remaining characters
        while (readChars != -1) {
            for (int i = 0; i < readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }

        return count == 0 ? 1 : count;
    } finally {
        is.close();
    }
}

ログイン後にコピー

この方法ファイルを 1024 バイトのチャンクで読み取るため、行ごとに読み取る場合と比較して、ファイルシステムのアクセス数が大幅に削減されます。各チャンク中に発生した行数を維持し、合計数を累積します。

ベンチマークでは、このメソッドが LineNumberReader を使用するよりも大幅に高速であることが示されています。 1.3 GB のテキストファイルの場合、最適化されたメソッドでは行数をカウントするのに約 0.35 秒かかりますが、LineNumberReader では約 2.40 秒かかります。

以上が大きな Java データファイルの行を効率的にカウントするにはどうすればよいですか?の詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。