Parallel processing in Python

王林
Release: 2023-09-11 23:49:10
forward
957 people have browsed it

Parallel processing in Python

Introduction

In today's fast-paced digital environment, it is critical for developers and data scientists to efficiently complete computationally difficult tasks. Fortunately, Python offers powerful parallel processing capabilities due to its adaptability and broad ecosystem. We can achieve substantial performance improvements by breaking down difficult problems into smaller, more manageable activities and working on them simultaneously.

Python’s parallel processing capabilities allow us to leverage available computer resources to perform activities such as web scraping, scientific simulations, and data analysis faster and more efficiently. In this article, we will start a journey through parallel processing in Python. We'll examine a number of methods, including multiprocessing, asynchronous programming, and multithreading, and learn how to use them effectively to bypass performance roadblocks in your system. Join us as we realize the full power of parallel processing in Python and reach new heights of performance and productivity.

Understanding Parallel Processing

Splitting a job into smaller subtasks and running them simultaneously on multiple processors or cores is called parallel processing. Parallel processing can significantly reduce the overall execution time of a program by efficiently utilizing available computing resources. Asynchronous programming, multiprocessing, and multithreading are just a few of the parallel processing methods Python offers.

Multiple threads in Python

Using the multi-threading method, many threads run simultaneously in the same process and share the same memory. Multithreading can be easily implemented using Python's threading module. However, using multithreading in Python may not have a speedup effect on CPU-intensive operations because the Global Interpreter Lock (GIL) only allows one thread to execute Python bytecode at the same time. However, multithreading can be useful for I/O-intensive tasks because it allows threads to run other operations while waiting for I/O operations to complete.

Let’s look at an example of using multi-threading to download multiple web pages:

Example

import threading import requests def download_page(url): response = requests.get(url) print(f"Downloaded {url}") urls = [ "https://example.com", "https://google.com", "https://openai.com" ] threads = [] for url in urls: thread = threading.Thread(target=download_page, args=(url,)) thread.start() threads.append(thread) for thread in threads: thread.join()
Copy after login

Output

Downloaded https://example.com Downloaded https://google.com Downloaded https://openai.com
Copy after login

Since the above code snippet can do multiple downloads at the same time, this code snippet downloads each URL in its own thread. The join() function ensures that the main thread waits for each thread to complete before continuing.

Multiprocessing in Python

Multi-process corresponds to multi-threading. By using multiple processes, each process has its own memory space, providing true parallelism. Python's multiprocessing module provides a high-level interface to implement multiple processes. Multiprocessing is suitable for CPU-intensive tasks because each process runs in an independent Python interpreter, avoiding GIL multithreading limitations.

Multiple processes are used in the code below. Once the pool class has spawned a set of worker processes, the map() method distributes the burden to the available processes. A result list is a collection of results.

Consider the following example, in which we use multiple processes to calculate the square of each integer in a list:

Example

import multiprocessing def square(number): return number ** 2 numbers = [1, 2, 3, 4, 5] with multiprocessing.Pool() as pool: results = pool.map(square, numbers) print(results)
Copy after login

Output

[1, 4, 9, 16, 25]
Copy after login

Python Asynchronous Programming

By taking advantage of non-blocking operations, asynchronous programming enables efficient execution of I/O-intensive processes. Thanks to the asyncio package, Python can create asynchronous code using coroutines, event loops, and futures. As online applications and APIs become more popular, asynchronous programming becomes increasingly important.

The fetch_page() coroutine in the code example below uses aiohttp to asynchronously obtain web pages. The main() method generates a list of jobs and then uses asyncio.gather() to execute these jobs simultaneously. To wait for a task to complete and receive the results, use the await keyword.

Let's look at an example of using asyncio and aiohttp to asynchronously obtain multiple web pages:

Example

import asyncio import aiohttp async def fetch_page(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.text() async def main(): urls = [ "https://example.com", "https://google.com", "https://openai.com" ] tasks = [fetch_page(url) for url in urls] pages = await asyncio.gather(*tasks) print(pages) asyncio.run(main())
Copy after login

Output

['\n\n\n Example Domain\n\n \n \n \n  \n
\n\n\n
\n

Example Domain

\n

This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission.

\n

More information...

\n
\n\n', 'Google