Shared Memory in Multiprocessing
In Python's multiprocessing module, the question of whether large data structures are shared or copied between processes is of concern.
Original Concern
When creating multiple processes using multiprocessing.Process and passing in large lists as arguments, the concern is whether these lists are copied for each process or shared among them. If each process makes a copy, it could significantly increase memory usage.
Copy-on-Write
Linux uses a copy-on-write approach, which implies that data is not physically copied until it is modified. This suggests that the lists would not be duplicated for each subprocess.
Reference Counting
However, accessing an object updates its reference count. If a subprocess accesses a list element, its reference count increases. As a result, it is unclear whether the entire object (the list) would be copied.
Memory Usage Monitoring
Observations indicate that entire objects are, in fact, duplicated for each subprocess, possibly due to reference counting. This is problematic if the lists cannot be modified and their reference count is always positive.
Shared Memory in Python 3.8.0
Python 3.8.0 introduces 'true' shared memory using the multiprocessing.shared_memory module. This allows for explicit creation of shared memory objects that can be accessed from multiple processes without copying.
In summary, the copy-on-write approach in Linux reduces the likelihood of copying large data structures, but reference counting can lead to actual copying. Using 'true' shared memory in Python 3.8.0 solves this issue by providing a mechanism for creating explicitly shared objects.
The above is the detailed content of Are Large Data Structures Shared or Copied in Python\'s Multiprocessing?. For more information, please follow other related articles on the PHP Chinese website!