Home > Backend Development > Python Tutorial > How to Effectively Handle Large CSV Files in Python 2.7?

How to Effectively Handle Large CSV Files in Python 2.7?

Mary-Kate Olsen
Release: 2024-11-08 03:32:02
Original
642 people have browsed it

How to Effectively Handle Large CSV Files in Python 2.7?

Reading Large .csv Files in Python

Problem: Reading massive .csv files (up to 1 million rows, 200 columns) in Python 2.7 encounters memory errors.

The initial approach iterates through the entire file and stores data in memory as lists. However, this method becomes impractical for large files, as it consumes excessive memory.

Solution:

1. Process Rows as They Are Produced:

Avoid loading the entire file into memory. Instead, process rows as they are generated using a generator function.

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        for row in datareader:
            if row[3] == criterion:
                yield row
Copy after login

2. Use Generator Functions for Filtering:

Filter data while iterating through the file using generator functions. This approach allows for matching multiple consecutive rows meeting a specific criterion.

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        yield from takewhile(
            lambda r: r[3] == criterion,
            dropwhile(lambda r: r[3] != criterion, datareader))
        return
Copy after login

3. Optimize Memory Consumption:

Refactor getdata() to use a generator function as well, ensuring that only one row is held in memory at any time.

def getdata(filename, criteria):
    for criterion in criteria:
        for row in getstuff(filename, criterion):
            yield row
Copy after login

Additional Tips for Speed:

  • Use csv.reader with a chunk size parameter: Read files in smaller chunks to reduce memory footprint.
  • Consider using a database engine: If the data fits, store it in a database for faster and more efficient processing.

The above is the detailed content of How to Effectively Handle Large CSV Files in Python 2.7?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template