Python 中的並發性與線程和多處理-Python教學-PHP中文網

Concurrency in Python with Threading and Multiprocessing

並發是現代程式設計中的一個重要思想，它允許多個任務同時運行以提高應用程式的效能。

在 Python 中實作並發的方法有多種，其中最著名的是執行緒和多處理。

在本文中，我們將詳細探討這兩種方法，了解它們的工作原理，並討論何時使用每種方法，以及實際的程式碼範例。

什麼是並發？

在我們討論線程和多處理之前，了解並發的含義很重要。

並發是指一個程式可以同時執行多個任務或進程。

這可以使程式更好地利用資源並運行得更快，特別是當它需要執行諸如讀取文件或進行大量計算之類的操作時。

實現並發的方式主要有兩種：

並行性：在電腦處理器的不同部分上同時執行多個任務。
並發：在同一時間段內處理多個任務，但不一定在完全相同的時刻。

Python 提供了兩種主要方式來實現並發：

執行緒：適用於可以同時管理的任務。
多處理：適用於需要在不同處理器核心上真正同時執行的任務。

Python 中的線程

執行緒可讓您在同一進程內執行多個較小的進程單元（稱為執行緒），共享相同的記憶體空間。

執行緒比進程更輕，它們之間的切換更快。

但是，Python 中的執行緒受全域解釋器鎖定 (GIL) 的約束，這確保一次只有一個執行緒可以執行 Python 程式碼。

線程如何工作

Python的線程模組提供了一種簡單靈活的方式來建立和管理線程。

讓我們從一個基本範例開始：

import threading
import time


def print_numbers():
    for i in range(5):
        print(f"Number: {i}")
        time.sleep(1)


# Creating a thread
thread = threading.Thread(target=print_numbers)

# Starting the thread
thread.start()

# Wait for the thread to complete
thread.join()

print("Thread has finished executing")


# Output:
# Number: 0
# Number: 1
# Number: 2
# Number: 3
# Number: 4
# Thread has finished executing

登入後複製

在此範例中：

我們定義了一個函數 print_numbers()，它列印從 0 到 4 的數字，兩次列印之間有一秒的延遲。
我們使用 threading.Thread() 建立一個線程，並將 print_numbers() 作為目標函數傳遞。
start() 方法開始執行緒的執行，而 join() 確保主程式等待執行緒完成後再繼續執行。

範例：I/O 密集型任務的執行緒化

執行緒對於 I/O 密集型任務特別有用，例如檔案操作、網路請求或資料庫查詢，在這些任務中程式大部分時間都在等待外部資源。

這是一個使用執行緒模擬下載檔案的範例：

import threading
import time


def download_file(file_name):
    print(f"Starting download of {file_name}...")
    time.sleep(2)  # Simulate download time
    print(f"Finished downloading {file_name}")


files = ["file1.zip", "file2.zip", "file3.zip"]

threads = []

# Create and start threads
for file in files:
    thread = threading.Thread(target=download_file, args=(file,))
    thread.start()
    threads.append(thread)

# Ensure all threads have finished
for thread in threads:
    thread.join()

print("All files have been downloaded.")

# Output:
# Starting download of file1.zip...
# Starting download of file2.zip...
# Starting download of file3.zip...
# Finished downloading file1.zip
# Finished downloading file2.zip
# Finished downloading file3.zip
# All files have been downloaded.

登入後複製

透過為每個文件下載建立和管理單獨的線程，程式可以同時處理多個任務，從而提高整體效率。

程式碼中關鍵步驟如下：

定義了一個函數 download_file 來模擬下載過程。
建立檔案名稱清單來表示需要下載的檔案。
對於清單中的每個文件，都會建立一個新線程，並以 download_file 作為其目標函數。每個線程在創建後立即啟動並添加到線程列表中。
主程式使用 join() 方法等待所有執行緒完成，確保程式在所有下載完成之前不會繼續進行。

線程的局限性

雖然執行緒可以提高 I/O 密集型任務的效能，但它也有限制：

全域解釋器鎖定 (GIL)：對於 CPU 密集型任務，GIL 限制一次只能執行一個線程，從而限制了多核心處理器中執行緒的有效性。
競爭條件：由於執行緒共享相同的記憶體空間，不正確的同步可能會導致競爭條件，程式的結果取決於執行緒的時間。
死鎖：執行緒相互等待釋放資源可能會導致死鎖，因此無法取得任何進展。

Python 中的多處理

多處理透過使用單獨的進程而不是執行緒來解決執行緒的限制。

每個進程都有自己的記憶體空間和Python解釋器，允許在多核心系統上實現真正的並行。

這使得多重處理成為需要大量計算的任務的理想選擇。

多重處理的工作原理

Python 中的多處理模組可讓您輕鬆建立和管理進程。

Let’s start with a basic example:

import multiprocessing
import time


def print_numbers():
    for i in range(5):
        print(f"Number: {i}")
        time.sleep(1)


if __name__ == "__main__":
    # Creating a process
    process = multiprocessing.Process(target=print_numbers)

    # Starting the process
    process.start()

    # Wait for the process to complete
    process.join()

    print("Process has finished executing")

# Output:
# Number: 0
# Number: 1
# Number: 2
# Number: 3
# Number: 4
# Process has finished executing

登入後複製

This example is similar to the threading example, but with processes.

Notice that the process creation and management are similar to threading, but because processes run in separate memory spaces, they are truly concurrent and can run on different CPU cores.

Example: Multiprocessing for CPU-Bound Tasks

Multiprocessing is particularly beneficial for tasks that are CPU-bound, such as numerical computations or data processing.

Here’s an example that calculates the square of numbers using multiple processes:

import multiprocessing


def compute_square(number):
    return number * number


if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]

    # Create a pool of processes
    with multiprocessing.Pool() as pool:
        # Map function to numbers using multiple processes
        results = pool.map(compute_square, numbers)

    print("Squares:", results)

# Output:
# Squares: [1, 4, 9, 16, 25]

登入後複製

Here are the key steps in the code:

A function compute_square is defined to take a number as input and return its square.
The code within the if name == "main": block ensures that it runs only when the script is executed directly.
A list of numbers is defined, which will be squared.
A pool of worker processes is created using multiprocessing.Pool().
The map method is used to apply the compute_square function to each number in the list, distributing the workload across multiple processes.

Inter-Process Communication (IPC)

Since each process has its own memory space, sharing data between processes requires inter-process communication (IPC) mechanisms.

The multiprocessing module provides several tools for IPC, such as Queue, Pipe, and Value.

Here’s an example using Queue to share data between processes:

import multiprocessing


def worker(queue):
    # Retrieve and process data from the queue
    while not queue.empty():
        item = queue.get()
        print(f"Processing {item}")


if __name__ == "__main__":
    queue = multiprocessing.Queue()

    # Add items to the queue
    for i in range(10):
        queue.put(i)

    # Create a pool of processes to process the queue
    processes = []
    for _ in range(4):
        process = multiprocessing.Process(target=worker, args=(queue,))
        processes.append(process)
        process.start()

    # Wait for all processes to complete
    for process in processes:
        process.join()

    print("All processes have finished.")


# Output:
# Processing 0
# Processing 1
# Processing 2
# Processing 3
# Processing 4
# Processing 5
# Processing 6
# Processing 7
# Processing 8
# Processing 9
# All processes have finished.

登入後複製

In this example:

def worker(queue): Defines a function worker that takes a queue as an argument. The function retrieves and processes items from the queue until it is empty.
if name == "main":: Ensures that the following code runs only if the script is executed directly, not if it is imported as a module.
queue = multiprocessing.Queue(): Creates a queue object for inter-process communication.
for i in range(10): queue.put(i): Adds items (numbers 0 through 9) to the queue.
processes = []: Initializes an empty list to store process objects.
The for loop for _ in range(4): Creates four worker processes.
process = multiprocessing.Process(target=worker, args=(queue,)): Creates a new process with worker as the target function and passes the queue as an argument.
processes.append(process): Adds the process object to the processes list.
process.start(): Starts the process.
The for loop for process in processes: Waits for each process to complete using the join() method.

Challenges of Multiprocessing

While multiprocessing provides true parallelism, it comes with its own set of challenges:

Higher Overhead: Creating and managing processes is more resource-intensive than threads due to separate memory spaces.
Complexity: Communication and synchronization between processes are more complex than threading, requiring IPC mechanisms.
Memory Usage: Each process has its own memory space, leading to higher memory usage compared to threading.

When to Use Threading vs. Multiprocessing

Choosing between threading and multiprocessing depends on the type of task you're dealing with:

Use Threading:

For tasks that involve a lot of waiting, such as network operations or reading/writing files (I/O-bound tasks).
When you need to share memory between tasks and can manage potential issues like race conditions.
For lightweight concurrency without the extra overhead of creating multiple processes.

Use Multiprocessing:

For tasks that require heavy computations or data processing (CPU-bound tasks) and can benefit from running on multiple CPU cores at the same time.
When you need true parallelism and the Global Interpreter Lock (GIL) in threading becomes a limitation.
For tasks that can run independently and don’t require frequent communication or shared memory.

Conclusion

Concurrency in Python is a powerful way to make your applications run faster.

Threading is great for tasks that involve a lot of waiting, like network operations or reading/writing files, but it's not as effective for tasks that require heavy computations because of something called the Global Interpreter Lock (GIL).

On the other hand, multiprocessing allows for true parallelism, making it perfect for CPU-intensive tasks, although it comes with higher overhead and complexity.

無論您是在處理資料、處理多個網路要求，還是進行複雜的運算，Python 的執行緒和多重處理工具都能為您提供所需的功能，使您的程式盡可能高效、快速。

以上是Python 中的並發性與線程和多處理的詳細內容。更多資訊請關注PHP中文網其他相關文章！