PHP-TSRM thread safety manager-source code analysis-PHP Tutorial-php.cn

When viewing PHP source code or developing PHP extensions, a large number of TSRMLS_ macros will appear in the position of function parameters. These macros are provided by Zend for the thread safety mechanism (Zend Thread `Safety, ZTS for short) to ensure Thread safety is a solution provided to prevent the PHP interpreter from being loaded and executed in the form of a module in a multi-threaded environment, causing read errors in some internal public resources.

When do you need to use TSRM

As long as the server is a multi-threaded environment and PHP is provided in the form of a module, then TSRM needs to be enabled, such as the worker mode under apache (multiple processes and multiple Thread) environment, in this case you must use a thread-safe version of PHP, that is, enable TSRM. Under Linux, you specify whether to enable TSRM when compiling PHP. Under Windows, a thread-safe version and a non-thread-safe version of PHP are provided. .

How to implement TSRM in PHP

In a normal multi-threaded environment, mutex locks are added to public resources, but PHP does not choose to lock, because locking may cause some performance loss. PHP's solution is to copy all the public resources of the current PHP kernel for each thread. Each thread points to its own public resource area without affecting each other, and each operates its own public resources.

What are public resources

It is a variety of struct structure definitions

TSRM data structure

tsrm_tls_entry thread structure, each thread There is a copy of this structure

typedef struct _tsrm_tls_entry tsrm_tls_entry; struct _tsrm_tls_entry { void **storage; int count; THREAD_T thread_id; tsrm_tls_entry *next; } static tsrm_tls_entry **tsrm_tls_table = NULL //线程指针表头指针 static int tsrm_tls_table_size; //当前线程结构体数量

Copy after login

Field description

void **storage ：资源指针、就是指向自己的公共资源内存区 int count : 资源数、就是 PHP内核 + 扩展模块 共注册了多少公共资源 THREAD_T thread_id ： 线程id tsrm_tls_entry *next：指向下一个线程指针，因为当前每一个线程指针都存在一个线程指针表里（类似于hash表），这个next可以理解成是hash冲突链式解决法. tsrm_resource_type 公共资源类型结构体、注册了多少公共资源就有多少个该结构体

Copy after login

typedef struct { size_t size; ts_allocate_ctor ctor; ts_allocate_dtor dtor; int done; } tsrm_resource_type; static tsrm_resource_type *resource_types_table=NULL; //公共资源类型表头指针 static int resource_types_table_size; //当前公共资源类型数量

Copy after login

Field description

size_t size : 资源大小 ts_allocate_ctor ctor: 构造函数指针、在给每一个线程创建该资源的时候会调用一下当前ctor指针 ts_allocate_dtor dtor : 析构函数指针、释放该资源的时候会调用一下当前dtor指针 int done : 资源是否已经销毁 0:正常 1:已销毁

Copy after login

Global resource id

typedef int ts_rsrc_id; static ts_rsrc_id id_count;

Copy after login

What is the global resource id

TSRM will generate a unique ID for each resource when registering public resources. You need to specify the corresponding resource ID when obtaining the resource in the future.

Why do we need global resource id

Because each of our threads will copy all the currently registered public resources, that is, a malloc() and a large array. This resource id is The index of the array, that is, to obtain the corresponding resource, you need to specify the id of the corresponding resource.

It’s easy to understand:
Because TSRM allows each thread to point to its own pile of public resources (array), and you want to find what you want in this pile of public resources. Resources must be obtained through the corresponding resource id. If it is not this thread-safe version, then these public resources will not be aggregated into a pile, and they can be obtained directly through the corresponding name.

About execution process

When the kernel is initialized, initialize TSRM, register public resources involved in the kernel, and register public resources involved in external extensions.

The corresponding thread calls the PHP interpreter function entry position to initialize the public resource data of the current thread.

If you need the public resource, just get it through the corresponding resource id.

TSRM initialization structure chart

PHP-TSRM thread safety manager-source code analysis

##TSRM source file path

/php-5.3.27/TSRM/TSRM.c /php-5.3.27/TSRM/TSRM.h

Copy after login

TSRM involves the main Function

Initialize tsrm

tsrm_startup()

Copy after login

ts_allocate_id()

Copy after login

Get and register all public resources. If they do not exist, initialize them and return the &storage pointer

#define TSRMLS_FETCH() void ***tsrm_ls = (void ***) ts_resource_ex(0, NULL)

Copy after login

Get the corresponding resource by specifying the resource id

#define ts_resource(id) ts_resource_ex(id, NULL)

Copy after login

Initialize the current thread and copy the existing public resource data to the storage pointer

allocate_new_resource()

Copy after login

TSRM Some common macro definitions

#ifdef ZTS #define TSRMLS_D void ***tsrm_ls #define TSRMLS_DC , TSRMLS_D #define TSRMLS_C tsrm_ls #define TSRMLS_CC , TSRMLS_C #else #define TSRMLS_D void #define TSRMLS_DC #define TSRMLS_C #define TSRMLS_CC #endif

Copy after login

You can see that if TSRM is turned on and ZTS is true, then this set of TSRM macros will be defined. These macros in the function parameter list often seen in expansions are will be replaced by the void ***tsrm_ls pointer. In fact, the current thread calls this function and passes the thread's public resource area address &storage** to ensure that the internal execution process of the function accurately obtains the public resources of the corresponding thread.

TSRM Approximate method of calling functions

Call

TSRMLS_FETCH() Replace void ***tsrm_ls

Execute

-> test(int a TSRMLS_CC) -> test_1(int b TSRMLS_CC)

Copy after login

Replace

-> test(int a ,tsrm_ls) -> test_1(int b ,tsrm_ls)

Copy after login

How to release TSRM

As mentioned above, apache's worker mode multi-process multi-threading means that one process opens multiple threads to call the PHP interpreter. When each thread ends, it will not Immediately destroy the resource data created by the current thread (because the thread may be used immediately, there is no need to re-initialize all the public resource data corresponding to the thread, it can be used directly), but wait for the process to end , all threads will be traversed and all threads and corresponding resource data will be released.

Source code comments

tsrm_startup function description

TSRM_API int tsrm_startup(int expected_threads, int expected_resources, int debug_level, char *debug_filename) { //省略... //默认线程数 tsrm_tls_table_size = expected_threads; //创建tsrm_tls_entry指针数组 tsrm_tls_table = (tsrm_tls_entry **) calloc(tsrm_tls_table_size, sizeof(tsrm_tls_entry *)); //省略... //全局资源唯一ID初始化 id_count=0; //默认资源类型数 resource_types_table_size = expected_resources; //省略... //创建tsrm_resource_type结构体数组 resource_types_table = (tsrm_resource_type *) calloc(resource_types_table_size, sizeof(tsrm_resource_type)); //省略... return 1; }

Copy after login

Generally, this function is called when the PHP kernel is initialized. In order to save memory, the default is a number of threads and a resource type. The number will be expanded later if it is not enough

ts_allocate_id Function description

TSRM_API ts_rsrc_id ts_allocate_id(ts_rsrc_id *rsrc_id, size_t size, ts_allocate_ctor ctor, ts_allocate_dtor dtor) { int i; //省略... //生成当前资源的唯一id *rsrc_id = TSRM_SHUFFLE_RSRCidD(id_count++); TSRM_ERROR((TSRM_ERROR_LEVEL_CORE, "Obtained resource id %d", *rsrc_id)); //判断当前资源类型表是否小于当前资源数 //如果小于则对资源类型表进行扩容 if (resource_types_table_size < id_count) { resource_types_table = (tsrm_resource_type *) realloc(resource_types_table, sizeof(tsrm_resource_type)*id_count); //省略... resource_types_table_size = id_count; } //赋值公共资源的大小，构造函数和析构函数指针 resource_types_table[TSRM_UNSHUFFLE_RSRC_ID(*rsrc_id)].size = size; resource_types_table[TSRM_UNSHUFFLE_RSRC_ID(*rsrc_id)].ctor = ctor; resource_types_table[TSRM_UNSHUFFLE_RSRC_ID(*rsrc_id)].dtor = dtor; resource_types_table[TSRM_UNSHUFFLE_RSRC_ID(*rsrc_id)].done = 0; //遍历说有的线程结构体，把当前创建的资源数据赋给storage指向的内存空间 for (i=0; istorage进行扩容，因为资源id都是递增增加的,并根据当前资源的size //malloc创建具体的资源内存空间，创建完成之后回调一下ctor while (p) { if (p->count < id_count) { int j; p->storage = (void *) realloc(p->storage, sizeof(void *)*id_count); for (j=p->count; jstorage[j] = (void *) malloc(resource_types_table[j].size); if (resource_types_table[j].ctor) { resource_types_table[j].ctor(p->storage[j], &p->storage); } } //id_count每次+1 ， 实际上就是我们公共资源的总数量 p->count = id_count; } //指向下一个线程结构体指针 p = p->next; } } //省略... //返回刚才id_count++ return *rsrc_id; }

Copy after login

This function must be called when it is necessary to register and create a public resource data, usually in a multi-threaded environment. will be called, and it can be seen that this function will traverse all thread structure pointers and continuously ralloc and malloc, so repeated calls to this function will also cause performance losses.

TSRMLS_FETCH() -> ts_resource_ex function Description

TSRM_API void *ts_resource_ex(ts_rsrc_id id, THREAD_T *th_id) { THREAD_T thread_id; int hash_value; tsrm_tls_entry *thread_resources; //省略... if(tsrm_tls_table) { //获取当前线程ID if (!th_id) { //省略... thread_id = tsrm_thread_id(); } else { thread_id = *th_id; } TSRM_ERROR((TSRM_ERROR_LEVEL_INFO, "Fetching resource id %d for thread %ld", id, (long) thread_id)); tsrm_mutex_lock(tsmm_mutex); #define THREAD_HASH_OF(thr,ts) (unsigned long)thr%(unsigned long)ts //通过线程id和当前初始化线程数大小进行取模运算，算出当前线程指针位置因为 //当前线程指针都存在tsrm_tls_table表里，如果当前位置已经存在一个线程指针 //则 tsrm_tls_table->next 实际上就是一个hash冲突链式解决方法. hash_value = THREAD_HASH_OF(thread_id, tsrm_tls_table_size); thread_resources = tsrm_tls_table[hash_value]; //如果不存在去创建当前线程，并将之前调用ts_allocate_id注册创建的那些公共资源 //全部copy过来. if (!thread_resources) { allocate_new_resource(&tsrm_tls_table[hash_value], thread_id); return ts_resource_ex(id, &thread_id); } else { do { //判断线程id是否相等 if (thread_resources->thread_id == thread_id) { break; } //如果不等于则next if (thread_resources->next) { thread_resources = thread_resources->next; } else { //如果不存在则还是去初始化创建当前线程 allocate_new_resource(&thread_resources->next, thread_id); return ts_resource_ex(id, &thread_id); } } while (thread_resources); } //找到或创建完当前线程之后，返回当前线程公共资源区&storage指针 //如果指定资源id的话则返回 storage[id] 指针 TSRM_SAFE_RETURN_RSRC(thread_resources->storage, id, thread_resources->count); }

Copy after login

allocate_new_resource Function description

static void allocate_new_resource(tsrm_tls_entry **thread_resources_ptr, THREAD_T thread_id) { int i; //thread_resources_ptr //有可能是&tsrm_tls_table[hash_value]指针 //有可能是&tsrm_tls_table[hash_value]->next指针，这种情况就是hash冲突了 (*thread_resources_ptr) = (tsrm_tls_entry *) malloc(sizeof(tsrm_tls_entry)); (*thread_resources_ptr)->storage = (void **) malloc(sizeof(void *)*id_count); (*thread_resources_ptr)->count = id_count; (*thread_resources_ptr)->thread_id = thread_id; (*thread_resources_ptr)->next = NULL; /* Set thread local storage to this new thread resources structure */ tsrm_tls_set(*thread_resources_ptr); if (tsrm_new_thread_begin_handler) { tsrm_new_thread_begin_handler(thread_id, &((*thread_resources_ptr)->storage)); } //这个循环就是把resource_types_table表里面的全部资源类型数据取出来 //根据size大小创建具体的内存空间，并赋值给当前线程的storage //因为刚才调用ts_allocate_id这个函数，可能存在线程指针没有初始化的情况 //所以只创建全局资源类型数据了，并没有创建具体的资源数据. for (i=0; istorage[i] = NULL; } else { (*thread_resources_ptr)->storage[i] = (void *) malloc(resource_types_table[i].size); if (resource_types_table[i].ctor) { resource_types_table[i].ctor((*thread_resources_ptr)->storage[i], &(*thread_resources_ptr)->storage); } } } //调用该函数指针，复制配置信息并回调有配置callback函数的配置项来 //填充当前线程对应的storage全局区 if (tsrm_new_thread_end_handler) { tsrm_new_thread_end_handler(thread_id, &((*thread_resources_ptr)->storage)); } }

Copy after login

Extended TSRM usage

我们在开发扩展的时候也要按照线程安全版本去开发，通过 ZTS 宏判断当前 PHP 是否线程安全版本.

扩展里公共资源定义：

//定义公共资源数据，替换之后就是一个zend_模块名字的结构体 ZEND_BEGIN_MODULE_GLOBALS(module_name) int id; char name; ZEND_END_MODULE_GLOBALS(module_name) //对应的宏定义 #define ZEND_BEGIN_MODULE_GLOBALS(module_name) typedef struct _zend_##module_name##_globals { #define ZEND_END_MODULE_GLOBALS(module_name) } zend_##module_name##_globals; //替换后 typedef struct _zend_module_name_globals { int id; char name; } zend_module_name_globals;

Copy after login

扩展里的资源id定义

#ifdef ZTS #define ZEND_DECLARE_MODULE_GLOBALS(module_name) ts_rsrc_id module_name##_globals_id; #else #define ZEND_DECLARE_MODULE_GLOBALS(module_name) zend_##module_name##_globals module_name##_globals; #endif

Copy after login

（1）线程安全版本：则自动声明全局资源唯一id，因为每个线程都会通过当前的id去storage指向内存区获取资源数据
（2）非线程安全版本：则自动声明当前结构体变量，每次通过变量名获取资源就好了，因为不存在其他线程争抢的情况

扩展里获取公共资源数据

#ifdef ZTS #define MODULE_G(v) TSRMG(xx_globals_id, zend_xx_globals *, v) #else #define MODULE_G(v) (xx_globals.v) #endif

Copy after login

如上每次获取资源全部通过自己定义的MODULE_G()宏获取，如果是线程安全则通过对应的TSRM管理器获取当前线程指定的资源id数据，如果不是则直接通过资源变量名字获取即可

扩展里初始化公共资源

//一般初始化公共资源数据,都会在扩展的MINIT函数执行 //如果是ZTS则ts_allocate_id调用之. PHP_MINIT_FUNCTION(myextension){ #ifdef ZTS ts_allocate_id(&xx_globals_id，sizeof(zend_module_name_globals),ctor,dtor) #endif }

Copy after login

结束

上面介绍的就是PHP-TSRM线程安全管理器的实现，了解TSRM之后，无论是看内核源码还是开发PHP扩展都有很大的好处，因为内核和扩展里面充斥着大量的TSRM_宏定义.