Home > Article > Backend Development > How to use weak references in Python

How to use weak references in Python

PHPzforward: 2023-05-12 23:52:111237browse

Background

Before we start discussing weak references (weakref), let’s first take a look at what is a weak reference? What exactly does it do?

Suppose we have a multi-threaded program that processes application data concurrently:

# 占用大量资源，创建销毁成本很高\
class Data:\
    def __init__(self, key):\
        pass

Application data Data is uniquely identified by a key, and the same data may be accessed by multiple threads at the same time. Since Data requires a lot of system resources, the cost of creation and consumption is high. We hope that Data only maintains one copy in the program, and does not want to create it repeatedly even if it is accessed by multiple threads at the same time.

To this end, we try to design a caching middleware Cacher:

import threading
# 数据缓存
class Cacher:
    def __init__(self):
        self.pool = {}
        self.lock = threading.Lock()
    def get(self, key):
        with self.lock:
            data = self.pool.get(key)
            if data:
                return data
            self.pool[key] = data = Data(key)
            return data

Cacher internally uses a dict object to cache the created Data copy, and provides a get method for obtaining application data Data. When the get method obtains data, it first checks the cache dictionary. If the data already exists, it will be returned directly; if the data does not exist, it will create one and save it in the dictionary. Therefore, the data is entered into the cache dictionary after it is first created. If other threads access it at the same time later, the same copy in the cache will be used.

Feels very good! But the fly in the ointment is: Cacher has the risk of resource leakage!

Because once Data is created, it is stored in the cache dictionary and will never be released! In other words, the program's resources, such as memory, will continue to grow and may eventually explode. Therefore, we hope that a piece of data can be automatically released after all threads no longer access it.

We can maintain the number of data references in Cacher, and the get method automatically accumulates this count. At the same time, a new remove method is provided for releasing data. It first decrements the number of references and deletes the data from the cache field when the number of references drops to zero.

The thread calls the get method to obtain the data. After the data is used up, the remove method needs to be called to release it. Cacher is equivalent to implementing the reference counting method itself, which is too troublesome! Doesn’t Python have a built-in garbage collection mechanism? Why does the application need to implement it itself?

The main crux of the conflict lies in Cacher's cache dictionary: as a middleware, it does not use data objects itself, so theoretically it should not have a reference to the data. Is there any black technology that can find the target object without generating a reference? We know that assignments generate references!

Typical usage

At this time, weak reference (weakref) makes a grand appearance! A weak reference is a special object that can be associated with the target object without generating a reference.

# 创建一个数据
>>> d = Data('fasionchan.com')
>>> d
<__main__.Data object at 0x1018571f0>

# 创建一个指向该数据的弱引用
>>> import weakref
>>> r = weakref.ref(d)

# 调用弱引用对象，即可找到指向的对象
>>> r()
<__main__.Data object at 0x1018571f0>
>>> r() is d
True

# 删除临时变量d，Data对象就没有其他引用了，它将被回收
>>> del d
# 再次调用弱引用对象，发现目标Data对象已经不在了（返回None）
>>> r()

How to use weak references in Python

In this way, we only need to change the Cacher cache dictionary to save weak references, and the problem will be solved!

import threading
import weakref
# 数据缓存
class Cacher:
    def __init__(self):
        self.pool = {}
        self.lock = threading.Lock()
    def get(self, key):
        with self.lock:
            r = self.pool.get(key)
            if r:
                data = r()
                if data:
                    return data
            data = Data(key)
            self.pool[key] = weakref.ref(data)
            return data

Since the cache dictionary only saves weak references to Data objects, Cacher will not affect the reference count of Data objects. When all threads have finished using the data, the reference count drops to zero and is released.

In fact, it is very common to use dictionaries to cache data objects. For this reason, the weakref module also provides two dictionary objects that only save weak references:

##weakref. WeakKeyDictionary , the key only saves the mapping class of weak references (once the key no longer has a strong reference, the key-value pair entry will automatically disappear);
weakref.WeakValueDictionary , the value only saves weak references Mapping class (once the value no longer has a strong reference, the key-value pair entry will automatically disappear);

Therefore, our data cache dictionary can be implemented using weakref.WeakValueDictionary, its interface It's exactly the same as a regular dictionary. In this way, we no longer need to maintain weak reference objects by ourselves, and the code logic is more concise and clear: The

import threading
import weakref
# 数据缓存
class Cacher:
    def __init__(self):
        self.pool = weakref.WeakValueDictionary()
        self.lock = threading.Lock()
    def get(self, key):
        with self.lock:
            data = self.pool.get(key)
            if data:
                return data
            self.pool[key] = data = Data(key)
            return data

weakref module also has many useful tool classes and tool functions. Please refer to the official documentation for specific details, which will not be repeated here.

Working Principle

So, what exactly is a weak reference, and why does it have such magical power? Next, let’s take off its veil and see its true appearance!

>>> d = Data('fasionchan.com')

# weakref.ref 是一个内置类型对象
>>> from weakref import ref
>>> ref


# 调用weakref.ref类型对象，创建了一个弱引用实例对象
>>> r = ref(d)
>>> r

After the previous chapters, we are already familiar with reading the source code of built-in objects. The relevant source code files are as follows:

typedef struct _PyWeakReference PyWeakReference;

/* PyWeakReference is the base struct for the Python ReferenceType, ProxyType,
 * and CallableProxyType.
 */
#ifndef Py_LIMITED_API
struct _PyWeakReference {
    PyObject_HEAD

    /* The object to which this is a weak reference, or Py_None if none.
     * Note that this is a stealth reference:  wr_object's refcount is
     * not incremented to reflect this pointer.
     */
    PyObject *wr_object;

    /* A callable to invoke when wr_object dies, or NULL if none. */
    PyObject *wr_callback;

    /* A cache for wr_object's hash code.  As usual for hashes, this is -1
     * if the hash code isn't known yet.
     */
    Py_hash_t hash;

    /* If wr_object is weakly referenced, wr_object has a doubly-linked NULL-
     * terminated list of weak references to it.  These are the list pointers.
     * If wr_object goes away, wr_object is set to Py_None, and these pointers
     * have no meaning then.
     */
    PyWeakReference *wr_prev;
    PyWeakReference *wr_next;
};
#endif

It can be seen that the PyWeakReference structure is the body of the weak reference object. It is a fixed-length object. In addition to the fixed header, there are 5 fields:

How to use weak references in Python

##wr_object, object pointer, pointing to the referenced object, weak The reference can find the referenced object based on this field, but no reference will be generated;

wr_callback, pointing to a callable object, will be called when the referenced object is destroyed;
hash ，缓存被引用对象的哈希值；
wr_prev 和 wr_next 分别是前后向指针，用于将弱引用对象组织成双向链表；

结合代码中的注释，我们知道：

How to use weak references in Python

弱引用对象通过 wr_object 字段关联被引用的对象，如上图虚线箭头所示；
一个对象可以同时被多个弱引用对象关联，图中的 Data 实例对象被两个弱引用对象关联；
所有关联同一个对象的弱引用，被组织成一个双向链表，链表头保存在被引用对象中，如上图实线箭头所示；
当一个对象被销毁后，Python 将遍历它的弱引用链表，逐一处理：

将 wr_object 字段设为 None ，弱引用对象再被调用将返回 None ，调用者便知道对象已经被销毁了；
执行回调函数 wr_callback （如有）；

由此可见，弱引用的工作原理其实就是设计模式中的 观察者模式（ Observer ）。当对象被销毁，它的所有弱引用对象都得到通知，并被妥善处理。

实现细节

掌握弱引用的基本原理，足以让我们将其用好。如果您对源码感兴趣，还可以再深入研究它的一些实现细节。

前面我们提到，对同一对象的所有弱引用，被组织成一个双向链表，链表头保存在对象中。由于能够创建弱引用的对象类型是多种多样的，很难由一个固定的结构体来表示。因此，Python 在类型对象中提供一个字段 tp_weaklistoffset ，记录弱引用链表头指针在实例对象中的偏移量。

How to use weak references in Python

由此一来，对于任意对象 o ，我们只需通过 ob_type 字段找到它的类型对象 t ，再根据 t 中的 tp_weaklistoffset 字段即可找到对象 o 的弱引用链表头。

Python 在 Include/objimpl.h 头文件中提供了两个宏定义：

/* Test if a type supports weak references */
#define PyType_SUPPORTS_WEAKREFS(t) ((t)->tp_weaklistoffset > 0)

#define PyObject_GET_WEAKREFS_LISTPTR(o) \
    ((PyObject **) (((char *) (o)) + Py_TYPE(o)->tp_weaklistoffset))

PyType_SUPPORTS_WEAKREFS 用于判断类型对象是否支持弱引用，仅当 tp_weaklistoffset 大于零才支持弱引用，内置对象 list 等都不支持弱引用；
PyObject_GET_WEAKREFS_LISTPTR 用于取出一个对象的弱引用链表头，它先通过 Py_TYPE 宏找到类型对象 t ，再找通过 tp_weaklistoffset 字段确定偏移量，最后与对象地址相加即可得到链表头字段的地址；

我们创建弱引用时，需要调用弱引用类型对象 weakref 并将被引用对象 d 作为参数传进去。弱引用类型对象 weakref 是所有弱引用实例对象的类型，是一个全局唯一的类型对象，定义在 Objects/weakrefobject.c 中，即：_PyWeakref_RefType（第 350 行）。

How to use weak references in Python

根据对象模型中学到的知识，Python 调用一个对象时，执行的是其类型对象中的 tp_call 函数。因此，调用弱引用类型对象 weakref 时，执行的是 weakref 的类型对象，也就是 type 的 tp_call 函数。tp_call 函数则回过头来调用 weakref 的 tp_new 和 tp_init 函数，其中 tp_new 为实例对象分配内存，而 tp_init 则负责初始化实例对象。

回到 Objects/weakrefobject.c 源文件，可以看到 PyWeakref_RefType 的 tp_new 字段被初始化成 *weakref___new_* （第 276 行）。该函数的主要处理逻辑如下：

解析参数，得到被引用的对象（第 282 行）；
调用 PyType_SUPPORTS_WEAKREFS 宏判断被引用的对象是否支持弱引用，不支持就抛异常（第 286 行）；
调用 GET_WEAKREFS_LISTPTR 行取出对象的弱引用链表头字段，为方便插入返回的是一个二级指针（第 294 行）；
调用 get_basic_refs 取出链表最前那个 callback 为空 基础弱引用对象（如有，第 295 行）；
如果 callback 为空，而且对象存在 callback 为空的基础弱引用，则复用该实例直接将其返回（第 296 行）；
如果不能复用，调用 tp_alloc 函数分配内存、完成字段初始化，并插到对象的弱引用链表（第 309 行）；

If the callback is empty, insert it directly into the front of the linked list to facilitate subsequent reuse (see point 4);
If callback is not empty. Insert it after the basic weak reference object (if any) to ensure that the basic weak reference is at the head of the linked list for easy access;

When an object is recycled , the tp_dealloc function will call the PyObject_ClearWeakRefs function to clean up its weak references. This function takes out the weak reference list of the object, then traverses it one by one, cleans the wr_object field and executes the wr_callback callback function (if any). The specific details will not be expanded. If you are interested, you can check the source code in Objects/weakrefobject.c, located at line 880.

Okay, after studying this section, we have thoroughly mastered the knowledge related to weak references. Weak references can manage the target object without generating a reference count, and are often used in frameworks and middleware. Weak references look magical, but in fact the design principle is a very simple observer pattern. After the weak reference object is created, it is inserted into a linked list maintained by the target object, and the destruction event of the object is observed (subscribed).

The above is the detailed content of How to use weak references in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement：

This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete

Previous article：How to use psutil to obtain hardware, network and process information in PythonNext article：How to use psutil to obtain hardware, network and process information in Python

See more

How to use weak references in Python

Background

Typical usage

实现细节

Related articles