Maintain a reference count for each memory object. When a new reference points to an object, increase the reference count of the object by one. When the reference to the object is destroyed, decrease the count by one. When the count returns to zero, recycle the memory resources occupied by the object. .
Mark-Clear
It is divided into two steps, one is marking, that is, distinguishing garbage objects that will no longer be used from many memory objects; The second is clearing, that is, clearing out the marked garbage objects. When marking, you need to determine the Root set of memory objects. All objects in the set are accessible. If an object in the Root set references other objects, the referenced objects cannot be marked as garbage objects. Then starting from the Root set, recursively traverse all objects that the Root set can access and mark them as not garbage objects. After the traversal is completed, the objects that are not marked are garbage objects.
Collect by generations
According to a statistical conclusion, if a memory object is found not to be garbage during a certain Mark process, then the possibility of it becoming garbage in the short term is very small. Generational collection collects memory objects that have not been marked as garbage objects in multiple garbage collection processes into another area - the old area, that is, the memory objects in this area are older. Because the probability of memory objects in old areas becoming garbage in the short term is very low, the frequency of garbage collection in these areas can be reduced. In contrast, objects in young areas are subject to high-frequency garbage collection. This improves the overall performance of garbage collection.
Memory Management in Python
In CPython, the life cycle of most objects is managed through the object’s reference count. Reference counting is the most intuitive and simple garbage collection count. Compared with other mainstream GC algorithms, its biggest advantage is real-time performance, that is, any memory will be recycled immediately once there is no reference pointing to it.
However, reference counting has two troublesome flaws:
When a circular reference occurs in the program, the reference count cannot detect it, and the memory object referenced by the cycle becomes unrecyclable memory, causing a memory leak. For example:
list1 = []
list1.append(list1)
del list1
The reference count of
list1循环引用了自身,第二行执行完后,list1的GC变成了2,执行完del操作后,list1 becomes 1 and is not reset to zero. The memory space of list1 is not released, causing a memory leak. Maintaining reference counts requires additional operations.
The reference count needs to be modified every time a memory object is referenced or the reference is destroyed. This type of operation is called footprint。引用计数的footprintIt is very high, which greatly affects the overall performance of the program.
In order to solve the problem of circular references, CPython has specially designed a module - GC module. Its main function is to check for garbage objects with circular references and clear them. The implementation of this module actually introduces the two mainstream garbage collection technologies mentioned earlier - mark clearing and generational collection.
In order to solve the performance problem of reference counting and try to obtain the highest efficiency in memory allocation and release, Python has also designed a large number of memory pool mechanisms.
Python’s garbage collection uses a reference counting method. The principle of reference counting can be found online. The following python allocator levels
Python GC mainly uses reference counting to track and recycle garbage. On the basis of reference counting, "mark and sweep" is used to solve the problem of circular references that may occur in container objects, and "generational collection" is used to improve garbage collection efficiency by exchanging space for time.
1 reference count
PyObject is a must-have content for every object, and ob_refcnt就是做为引用计数。当一个对象有新的引用时,它的ob_refcnt就会增加,当引用它的对象被删除,它的ob_refcnt will be reduced. When the reference count reaches 0, the life of the object is over.
Advantages:
Simple
Real time
Disadvantages:
Maintaining reference counts consumes resources
Circular Reference
2 Mark-clear mechanism
The basic idea is to allocate on demand first, and when there is no free memory, start from the registers and references on the program stack, traverse the graph composed of objects as nodes and references as edges, mark all accessible objects, and then Clean the memory space and release all unmarked objects.
3 generation technology
The overall idea of generational recycling is to divide all memory blocks in the system into different collections according to their survival time. Each collection becomes a "generation". The frequency of garbage collection increases with the survival time of the "generation". And to reduce, the survival time is usually measured by the number of garbage collections.
Python defines a three-generation object collection by default. The larger the index number, the longer the object survival time.
Common GC algorithms
Citation Count
Mark-Clear
Collect by generations
Memory Management in Python
In CPython, the life cycle of most objects is managed through the object’s reference count. Reference counting is the most intuitive and simple garbage collection count. Compared with other mainstream GC algorithms, its biggest advantage is real-time performance, that is, any memory will be recycled immediately once there is no reference pointing to it.
However, reference counting has two troublesome flaws:
When a circular reference occurs in the program, the reference count cannot detect it, and the memory object referenced by the cycle becomes unrecyclable memory, causing a memory leak. For example:
The reference count oflist1
循环引用了自身,第二行执行完后,list1
的GC变成了2,执行完del
操作后,list1
becomes 1 and is not reset to zero. The memory space of list1 is not released, causing a memory leak.Maintaining reference counts requires additional operations.
The reference count needs to be modified every time a memory object is referenced or the reference is destroyed. This type of operation is called
footprint
。引用计数的footprint
It is very high, which greatly affects the overall performance of the program.In order to solve the problem of circular references, CPython has specially designed a module -
GC module
. Its main function is to check for garbage objects with circular references and clear them. The implementation of this module actually introduces the two mainstream garbage collection technologies mentioned earlier - mark clearing and generational collection.In order to solve the performance problem of reference counting and try to obtain the highest efficiency in memory allocation and release, Python has also designed a large number of memory pool mechanisms.
Python’s garbage collection uses a reference counting method. The principle of reference counting can be found online. The following python allocator levels
In fact, it’s not just reference counting, but also mark-clear and generational recycling. This blog post summarizes it very well
http://hbprotoss.github.io/po...
Python GC mainly uses reference counting to track and recycle garbage. On the basis of reference counting, "mark and sweep" is used to solve the problem of circular references that may occur in container objects, and "generational collection" is used to improve garbage collection efficiency by exchanging space for time.
1 reference count
PyObject is a must-have content for every object, and
ob_refcnt
就是做为引用计数。当一个对象有新的引用时,它的ob_refcnt
就会增加,当引用它的对象被删除,它的ob_refcnt
will be reduced. When the reference count reaches 0, the life of the object is over.Advantages:
Simple
Real time
Disadvantages:
Maintaining reference counts consumes resources
Circular Reference
2 Mark-clear mechanism
The basic idea is to allocate on demand first, and when there is no free memory, start from the registers and references on the program stack, traverse the graph composed of objects as nodes and references as edges, mark all accessible objects, and then Clean the memory space and release all unmarked objects.
3 generation technology
The overall idea of generational recycling is to divide all memory blocks in the system into different collections according to their survival time. Each collection becomes a "generation". The frequency of garbage collection increases with the survival time of the "generation". And to reduce, the survival time is usually measured by the number of garbage collections.
Python defines a three-generation object collection by default. The larger the index number, the longer the object survival time.