1. Now there is a 300M file. The data in it needs to be read line by line and processed for each line (it cannot be made idempotent). However, I am afraid that the system will hang up during the processing. I don’t know which row to read again. I am afraid that the rows that have been processed before will be processed again.
2. The solution I am thinking of now is to read one row, process one row, and then delete the row. , even if the program hangs up, even if you execute the read from the beginning again, the originally processed lines will not be read
3. Let’s see if any experts have a good solution. Thank you
//If you want to read one line and delete one line, how should you do it? Or is there a better solution to avoid executing the same line twice?
$fp = fopen($fileName, "r");
if (!$fp) {
return -1;
}
$max = 40960; //40k
while (!feof($fp)) {
$line = fgets($fp, $max);
}
Since it is stipulated to read line by line, why not write a script to split the file into multiple small files (pay attention to the naming, such as the
split
command inLinux
), and then write a script to loop the reading operation .I think you can change the way to achieve the effect you want: you can read a line, and after processing the line, add a special symbol after the line. Looping through each row will tell you which rows have been processed. If your program hangs, when you execute it again, directly search for the place where the special symbol last appeared. Then the next line here is where you should start execution.
Write a separate file to record the processing location. You can also record the line number and the offset of the entire file.
Changing files while reading is too slow.