Reading Large CSV Files Effectively
Reading and processing large CSV files in Python can be challenging due to memory limitations. This issue becomes even more prominent with files containing millions of rows and hundreds of columns.
Memory Issues and Optimization
Your current code attempts to read and store data from the CSV file into a list. However, this approach is inefficient for large files as it loads the entire dataset into memory.
To resolve this memory issue, process the data as you read it. Use a generator function that yields one row at a time, as demonstrated below:
import csv def getstuff(filename, criterion): with open(filename, "rb") as csvfile: datareader = csv.reader(csvfile) yield next(datareader) # yield the header row count = 0 for row in datareader: if row[3] == criterion: yield row count += 1 elif count: # stop when exceeding the adjacent rows of criteria return
This updated code yields rows that match the specified criterion, line by line. It eliminates the need to keep the entire dataset in memory.
Performance Improvements
Beyond memory optimization, there are additional techniques to improve performance:
By employing these strategies, you can significantly improve the efficiency of your Python code for handling large CSV files.
The above is the detailed content of How to Effectively Read and Process Large CSV Files in Python?. For more information, please follow other related articles on the PHP Chinese website!