GIL is essentially a lock. Students who have studied operating systems know that locks are introduced to avoid data inconsistencies caused by concurrent access. There are many global variables defined outside functions in CPython, such as usable_arenas and usedpools in memory management. If multiple threads apply for memory at the same time, these variables may be modified at the same time, causing data confusion. In addition, Python's garbage collection mechanism is based on reference counting. All objects have an ob_refcnt field that indicates how many variables currently reference the current object. Operations such as variable assignment and parameter passing will increase the reference count. Exiting the scope or returning from the function will reduce the references. count. Similarly, if multiple threads modify the reference count of the same object at the same time, it is possible that ob_refcnt is different from the real value, which may cause a memory leak. Objects that will not be used will not be recycled. In more serious cases, the object may not be recycled. The referenced object caused the Python interpreter to crash.
The definition of GIL in CPython is as follows
struct _gil_runtime_state { unsigned long interval; // 请求 GIL 的线程在 interval 毫秒后还没成功,就会向持有 GIL 的线程发出释放信号 _Py_atomic_address last_holder; // GIL 上一次的持有线程,强制切换线程时会用到 _Py_atomic_int locked; // GIL 是否被某个线程持有 unsigned long switch_number; // GIL 的持有线程切换了多少次 // 条件变量和互斥锁,一般都是成对出现 PyCOND_T cond; PyMUTEX_T mutex; // 条件变量,用于强制切换线程 PyCOND_T switch_cond; PyMUTEX_T switch_mutex; };
The most essential thing is the locked field protected by mutex, which indicates whether the GIL is currently held. The other fields are for Used to optimize the GIL. When a thread applies for GIL, it calls the take_gil() method, and when it releases GIL, it calls the drop_gil() method. In order to avoid starvation, when a thread waits for interval milliseconds (default is 5 milliseconds) and has not applied for GIL, it will actively send a signal to the thread holding GIL, and the GIL holder will check the signal at the appropriate time. , if it is found that other threads are applying, the GIL will be forcibly released. The appropriate timing mentioned here is different in different versions. In the early days, it was checked every 100 instructions. In Python 3.10.4, it was checked at the end of the conditional statement, the end of each loop body of the loop statement, and the end of the function call. It will be checked when the time comes.
The function take_gil() that applies for GIL is simplified as follows
static void take_gil(PyThreadState *tstate) { ... // 申请互斥锁 MUTEX_LOCK(gil->mutex); // 如果 GIL 空闲就直接获取 if (!_Py_atomic_load_relaxed(&gil->locked)) { goto _ready; } // 尝试等待 while (_Py_atomic_load_relaxed(&gil->locked)) { unsigned long saved_switchnum = gil->switch_number; unsigned long interval = (gil->interval >= 1 ? gil->interval : 1); int timed_out = 0; COND_TIMED_WAIT(gil->cond, gil->mutex, interval, timed_out); if (timed_out && _Py_atomic_load_relaxed(&gil->locked) && gil->switch_number == saved_switchnum) { SET_GIL_DROP_REQUEST(interp); } } _ready: MUTEX_LOCK(gil->switch_mutex); _Py_atomic_store_relaxed(&gil->locked, 1); _Py_ANNOTATE_RWLOCK_ACQUIRED(&gil->locked, /*is_write=*/1); if (tstate != (PyThreadState*)_Py_atomic_load_relaxed(&gil->last_holder)) { _Py_atomic_store_relaxed(&gil->last_holder, (uintptr_t)tstate); ++gil->switch_number; } // 唤醒强制切换的线程主动等待的条件变量 COND_SIGNAL(gil->switch_cond); MUTEX_UNLOCK(gil->switch_mutex); if (_Py_atomic_load_relaxed(&ceval2->gil_drop_request)) { RESET_GIL_DROP_REQUEST(interp); } else { COMPUTE_EVAL_BREAKER(interp, ceval, ceval2); } ... // 释放互斥锁 MUTEX_UNLOCK(gil->mutex); }
In order to ensure atomicity, the entire function body needs to apply for and release the mutex lock gil->mutex at the beginning and end respectively. If the current GIL is idle, get the GIL directly. If it is not idle, wait for the condition variable gil->cond interval milliseconds (not less than 1 millisecond). If it times out and no GIL switching occurs during the period, set gil_drop_request to request forced switching. The GIL holds the thread, otherwise it continues to wait. Once the GIL is successfully obtained, the values of gil->locked, gil->last_holder and gil->switch_number need to be updated, the condition variable gil->switch_cond must be awakened, and the mutex lock gil->mutex must be released.
The function drop_gil() that releases GIL is simplified as follows
static void drop_gil(struct _ceval_runtime_state *ceval, struct _ceval_state *ceval2, PyThreadState *tstate) { ... if (tstate != NULL) { _Py_atomic_store_relaxed(&gil->last_holder, (uintptr_t)tstate); } MUTEX_LOCK(gil->mutex); _Py_ANNOTATE_RWLOCK_RELEASED(&gil->locked, /*is_write=*/1); // 释放 GIL _Py_atomic_store_relaxed(&gil->locked, 0); // 唤醒正在等待 GIL 的线程 COND_SIGNAL(gil->cond); MUTEX_UNLOCK(gil->mutex); if (_Py_atomic_load_relaxed(&ceval2->gil_drop_request) && tstate != NULL) { MUTEX_LOCK(gil->switch_mutex); // 强制等待一次线程切换才被唤醒,避免饥饿 if (((PyThreadState*)_Py_atomic_load_relaxed(&gil->last_holder)) == tstate) { assert(is_tstate_valid(tstate)); RESET_GIL_DROP_REQUEST(tstate->interp); COND_WAIT(gil->switch_cond, gil->switch_mutex); } MUTEX_UNLOCK(gil->switch_mutex); } }
First release the GIL under the protection of gil->mutex, and then wake up other threads that are waiting for the GIL. In a multi-CPU environment, the current thread has a higher probability of reacquiring the GIL after releasing the GIL. In order to avoid starving other threads, the current thread needs to be forced to wait for the condition variable gil->switch_cond. It can only obtain the GIL when other threads Only then will the current thread be awakened.
Code subject to GIL constraints cannot be executed in parallel, which reduces the overall performance. In order to minimize the performance loss, Python does not perform IO operations or not When intensive CPU calculations involving object access occur, the GIL will be actively released, reducing the granularity of the GIL, such as
reading and writing files
Network access
Encrypted data/Compressed data
So strictly speaking, in the case of a single process, multiple Python threads may be accessed simultaneously Execution, for example, one thread is running normally and another thread is compressing data.
GIL is a lock generated to maintain the consistency of internal variables of the Python interpreter. The consistency of user data is not responsible for GIL. Although GIL also ensures the consistency of user data to a certain extent. For example, instructions that do not involve jumps and function calls in Python 3.10.4 will be executed atomically under the constraints of GIL, but the consistency of data in business logic The user needs to lock it himself to ensure it.
The following code uses two threads to simulate the user's collection of fragments and winning awards
from threading import Thread def main(): stat = {"piece_count": 0, "reward_count": 0} t1 = Thread(target=process_piece, args=(stat,)) t2 = Thread(target=process_piece, args=(stat,)) t1.start() t2.start() t1.join() t2.join() print(stat) def process_piece(stat): for i in range(10000000): if stat["piece_count"] % 10 == 0: reward = True else: reward = False if reward: stat["reward_count"] += 1 stat["piece_count"] += 1 if __name__ == "__main__": main()
Assuming that the user can get a reward every time he collects 10 fragments, and each thread has collected 10,000,000 fragments, it should be 9999999 rewards were obtained (the last time was not calculated), a total of 20000000 fragments should be collected, and 1999998 rewards were obtained, but the results of the first run on my computer were as follows
{'piece_count': 20000000, 'reward_count': 1999987}
The total number of fragments is consistent with expectations, but the number of rewards But there are 12 missing. The number of pieces is correct because in Python 3.10.4, stat["piece_count"] = 1 is executed atomically under GIL constraints. Since the execution thread may be switched at the end of each loop, it is possible that thread t1 will increase piece_count to 100 at the end of a certain loop, but before the next loop starts to judge modulo 10, the Python interpreter switches to thread t2 for execution, and t2 will increase piece_count. If you reach 101, you will miss a reward.
Attachment: How to avoid being affected by GIL
Having said so much, if I don’t talk about the solution, it is just a popular science post, but it is useless. GIL is so bad, is there a way around it? Let’s take a look at what solutions are available.
Use multiprocess to replace Thread
The emergence of the multiprocess library is largely to make up for the inefficiency of the thread library due to GIL. It completely replicates a set of interfaces provided by thread to facilitate migration. The only difference is that it uses multiple processes instead of multiple threads. Each process has its own independent GIL, so there will be no GIL contention between processes.
Of course multiprocess is not a panacea. Its introduction will increase the difficulty of data communication and synchronization between time threads in the program. Take the counter as an example. If we want multiple threads to accumulate the same variable, for thread, declare a global variable and wrap three lines with the thread.Lock context. In multiprocess, since the processes cannot see each other's data, they can only declare a Queue in the main thread, put and then get, or use shared memory. This additional implementation cost makes coding multi-threaded programs, which is already very painful, even more painful. Where are the specific difficulties? Interested readers can further read this article
Use other parsers
As mentioned before, since GIL is only a product of CPython, are other parsers better? ? Yes, parsers like JPython and IronPython do not require the help of the GIL due to the nature of their implementation languages. However, by using Java/C# for the parser implementation, they also lost the opportunity to take advantage of the community's many useful features of the C language module. So these parsers have always been relatively niche. After all, everyone will choose the former over function and performance in the early stage. Done is better than perfect.
So it’s hopeless?
Of course, the Python community is also working very hard to continuously improve the GIL, and even try to remove the GIL. And there have been a lot of improvements in each minor version. Interested readers can further read this Slide
Another improvement Reworking the GIL
– Change the switching granularity from based on opcode counting to based on time slice counting
&ndash ; Prevent the thread that recently released the GIL lock from being scheduled again immediately
– Added thread priority function (high-priority threads can force other threads to release the GIL lock they hold)
The above is the detailed content of What is the GIL in Python. For more information, please follow other related articles on the PHP Chinese website!