Table of Contents
Python's non-lazy evaluation mechanism
Scenario 1: List explicitly bound to variables
Scenario 2: List literals are directly used for iterator creation
The core difference between memory usage and life cycle
Optimization and precautions
1. Optimize memory using generator expressions
2. Understand the responsibilities of the iter() function
3. Python's garbage collection mechanism
Summarize
Home Backend Development Python Tutorial Python list comprehension and iterator memory behavior in-depth analysis

Python list comprehension and iterator memory behavior in-depth analysis

Sep 17, 2025 am 06:18 AM

Python list comprehension and iterator memory behavior in-depth analysis

This article deeply explores the behavior of list literals, list comprehensions and iterators in Python in memory management. The core point is that Python's non-lazy evaluation characteristics lead to a list comprehension that will be created and consumed memory first, regardless of whether it is assigned to a variable or not. The main difference is that list literals of unbound variables are garbage collected faster after the iterator is created, while lists bound to variables remain occupant throughout the variable life cycle.

Python's non-lazy evaluation mechanism

In Python, the evaluation of an expression is usually "non-lazy", which means that when an expression is executed, its value is calculated immediately, rather than waiting until it is needed. For list comprehension [expression for item in iterable], this means that no matter whether the result of this list comprehension is assigned to a variable, it will first build a complete list object and all its elements in memory.

Consider two scenarios that show a high degree of similarity in the initial memory footprint:

Scenario 1: List explicitly bound to variables

When we assign the result of a list comprehension to a variable, the list object and all elements it contains will remain in memory until the variable is reassigned, deleted, or exceeded its scope.

 # CODE 1: List explicitly binds to variable import sys

# This line of code will immediately create a complete list of 5000 integers and bind it to my_list
my_list = [l for l in range(5000)]
print(f"list 'my_list' object' memory footprint (excluding the element itself): {sys.getsizeof(my_list)} bytes")
# Note: sys.getsizeof() returns the memory footprint of the list object itself.
# Excludes the total memory usage of 5000 integer objects inside it. But importantly, these 5000 integer objects have indeed been created.

# Create an iterator from an existing list my_iter1 = iter(my_list)
print(f"Iterator 'my_iter1' object' memory footprint: {sys.getsizeof(my_iter1)} bytes (usually smaller)")

# In this scenario, my_list and all integer objects referenced will continue to occupy memory.
# Until my_list is garbage collected or the program ends.

In this example, [l for l in range(5000)] creates a list of 5000 integers. Even if we then create an iterator from it, the original my_list and all its elements still exist in memory and are accessible through the my_list variable.

Scenario 2: List literals are directly used for iterator creation

When the result of the list comprehension is not explicitly assigned to any variable, but is directly passed to a function (such as iter()), Python will also create this list completely first.

 # CODE 2: List literals are used directly to create import sys by iterator

# Although there is no explicit variable reception, [i for i in range(5000)] will immediately create a complete list of # with 5000 integers.
# Then, the iter() function will receive this temporarily created list as a parameter.
my_iter2 = iter([i for i in range(5000)])
print(f"Iterator 'my_iter2' object's memory footprint: {sys.getsizeof(my_iter2)} bytes (usually smaller)")

# Key point: Anonymous list object used to create an iterator, after the iter() function returns,
# If there are no other references, it will immediately become a candidate for garbage collection.

In this scenario, [i for i in range(5000)] will also create a list of 5000 integers. The iter() function receives this temporary list and returns an iterator for it. Once the iter() function is executed and nowhere else references this temporarily created list object, Python's garbage collection mechanism can recycle the memory of the list and its elements.

The core difference between memory usage and life cycle

Through the above analysis, we can draw the following conclusions:

  1. Initial memory footprint: In both scenarios, the expressions [l for l in range(5000)] or [i for i in range(5000)] will create and occupy approximately the same memory space when executed, because Python will build this list in full. Therefore, from the perspective of "whether a large amount of data was created", CODE 1 and CODE 2 are similar in the list creation stage.
  2. Memory lifecycle: The core difference lies in the lifecycle of list objects in memory.
    • In scenario one , the list is bound to the my_list variable, and its memory will continue to be consumed until the life cycle of the my_list variable ends.
    • In Scenario 2 , the list is a temporary, anonymous object. It is created and used as an argument to the iter() function, and once the iter() function returns and no other references point to this list object, it immediately becomes a candidate for garbage collection. This means its memory footprint is short-lived.

In short, both func(expression) and variable = expression; func(variable) modes, under the non-lazy evaluation mechanism of Python, expression needs to be fully calculated and allocated memory. The only difference is that after func() returns, if the memory is not stored internally by func(), its memory will be recyclable immediately; while the latter will extend the memory life cycle due to the existence of variable.

Optimization and precautions

For applications that deal with large data sets or pursue memory efficiency, creating a complete list directly is often not the best choice.

1. Optimize memory using generator expressions

If your goal is to create an iterator and you don't need to keep the entire list in memory at the same time, you should use generator expressions instead of list comprehensions. The generator expression uses parentheses() instead of square brackets [], which does not build all elements at once, but generates them on demand:

 # Use generator expression import sys

# my_generator_iter is a generator object that does not create all 5000 integers immediately my_generator_iter = (i for i in range(5000))
print(f"Memory usage of generator object 'my_generator_iter': {sys.getsizeof(my_generator_iter)} bytes (very small)")

# Only when iterating, elements will be generated one by one and take up memory for item in my_generator_iter:
    # Process item
    pass

The advantage of generator expressions is that it only calculates and generates the next element when needed, greatly reducing the peak memory footprint.

2. Understand the responsibilities of the iter() function

The function of the iter() function is to obtain an iterator of an object. It is not responsible for creating the data itself, but rather obtaining an iterator from an existing iterable object. Therefore, if you pass iter() a large list, then the creation and memory usage of this large list have already occurred, and iter() just provides a traversal mechanism on this basis.

3. Python's garbage collection mechanism

Python uses reference counting as the main garbage collection mechanism. When an object's reference count becomes 0, it becomes a candidate for garbage collection. For circular references, Python also uses the mark-and-sweep algorithm for processing. Understanding these mechanisms helps better manage memory.

Summarize

When Python processes list comprehensions, regardless of whether the result is assigned to a variable, it will first perform a complete evaluation and build a complete list object in memory. Therefore, iter([i for i in range(5000)]) and my_list = [l for l in range(5000)]; iter(my_list) are similar in the initial memory allocation, since both create a list of 5000 integers. The main difference is the life cycle of this list object: a list literal not bound to a variable will become a candidate for garbage collection faster after completing its responsibilities (if used by iter()), while a list of a variable will continue to consume memory until the end of the life cycle of the variable.

To effectively manage memory, especially when processing large amounts of data, it is recommended to use generator expressions for item in iterable to create iterators to avoid loading all data into memory at once.

The above is the detailed content of Python list comprehension and iterator memory behavior in-depth analysis. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

ArtGPT

ArtGPT

AI image generator for creative art from text prompts.

Stock Market GPT

Stock Market GPT

AI powered investment research for smarter decisions

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to install packages from a requirements.txt file in Python How to install packages from a requirements.txt file in Python Sep 18, 2025 am 04:24 AM

Run pipinstall-rrequirements.txt to install the dependency package. It is recommended to create and activate the virtual environment first to avoid conflicts, ensure that the file path is correct and that the pip has been updated, and use options such as --no-deps or --user to adjust the installation behavior if necessary.

Efficient merge strategy of PEFT LoRA adapter and base model Efficient merge strategy of PEFT LoRA adapter and base model Sep 19, 2025 pm 05:12 PM

This tutorial details how to efficiently merge the PEFT LoRA adapter with the base model to generate a completely independent model. The article points out that it is wrong to directly use transformers.AutoModel to load the adapter and manually merge the weights, and provides the correct process to use the merge_and_unload method in the peft library. In addition, the tutorial also emphasizes the importance of dealing with word segmenters and discusses PEFT version compatibility issues and solutions.

How to test Python code with pytest How to test Python code with pytest Sep 20, 2025 am 12:35 AM

Python is a simple and powerful testing tool in Python. After installation, test files are automatically discovered according to naming rules. Write a function starting with test_ for assertion testing, use @pytest.fixture to create reusable test data, verify exceptions through pytest.raises, supports running specified tests and multiple command line options, and improves testing efficiency.

Floating point number accuracy problem in Python and its high-precision calculation scheme Floating point number accuracy problem in Python and its high-precision calculation scheme Sep 19, 2025 pm 05:57 PM

This article aims to explore the common problem of insufficient calculation accuracy of floating point numbers in Python and NumPy, and explains that its root cause lies in the representation limitation of standard 64-bit floating point numbers. For computing scenarios that require higher accuracy, the article will introduce and compare the usage methods, features and applicable scenarios of high-precision mathematical libraries such as mpmath, SymPy and gmpy to help readers choose the right tools to solve complex accuracy needs.

How to handle command line arguments in Python How to handle command line arguments in Python Sep 21, 2025 am 03:49 AM

Theargparsemoduleistherecommendedwaytohandlecommand-lineargumentsinPython,providingrobustparsing,typevalidation,helpmessages,anderrorhandling;usesys.argvforsimplecasesrequiringminimalsetup.

How to work with PDF files in Python How to work with PDF files in Python Sep 20, 2025 am 04:44 AM

PyPDF2, pdfplumber and FPDF are the core libraries for Python to process PDF. Use PyPDF2 to perform text extraction, merging, splitting and encryption, such as reading the page through PdfReader and calling extract_text() to get content; pdfplumber is more suitable for retaining layout text extraction and table recognition, and supports extract_tables() to accurately capture table data; FPDF (recommended fpdf2) is used to generate PDF, and documents are built and output through add_page(), set_font() and cell(). When merging PDFs, PdfWriter's append() method can integrate multiple files

python get current time example python get current time example Sep 15, 2025 am 02:32 AM

Getting the current time can be implemented in Python through the datetime module. 1. Use datetime.now() to obtain the local current time, 2. Use strftime("%Y-%m-%d%H:%M:%S") to format the output year, month, day, hour, minute and second, 3. Use datetime.now().time() to obtain only the time part, 4. It is recommended to use datetime.now(timezone.utc) to obtain UTC time, avoid using deprecated utcnow(), and daily operations can meet the needs by combining datetime.now() with formatted strings.

Efficient integration of multi-file data using Pandas: IP, MAC and port association tutorial Efficient integration of multi-file data using Pandas: IP, MAC and port association tutorial Sep 21, 2025 pm 03:00 PM

This tutorial shows in detail how to efficiently extract, correlate, and integrate specific data from multiple text files using Python's Pandas library. By loading the file data into a DataFrame and using merge operation to perform internal connections based on the IP address and MAC address, the final implementation of precise matching and outputting the association information of the IP, MAC address and corresponding ports from files from different sources.

See all articles