It’s always uncomfortable not knowing what’s going on, so I briefly learned about the relevant mechanisms by reading the source code and consulting limited information. This article is a summary of my research content. This article first explains the concept of thread safety and the background of thread safety in PHP, and then conducts a detailed study of PHP's thread safety mechanism ZTS (Zend Thread Safety) and the specific implementation of TSRM. The research content includes related data structures, implementation details and operating mechanisms. Finally, Zend's selective compilation issues for single-threaded and multi-threaded environments were studied.
Thread safety
Thread safety issue, in a nutshell, is how to safely access public resources in a multi-threaded environment. We know that each thread only has a private stack and shares the heap of the process to which it belongs. In C, when a variable is declared outside any function, it becomes a global variable. At this time, the variable will be allocated to the shared storage space of the process. Different threads refer to the same address space, so if a thread modifies If this variable is set, it will affect all threads. This seems to provide convenience for threads to share data, but PHP often processes one request per thread, so it is hoped that each thread will have a copy of the global variable, and does not want requests to interfere with each other. Early PHP was often used in a single-threaded environment, and each process only started one thread, so there was no thread safety issue. Later, the use of PHP in a multi-threaded environment emerged, so Zend introduced the Zend Thread Safety (ZTS) mechanism to ensure thread safety.
Basic principles and implementation of ZTS
Basic idea
Speaking of which, the basic idea of ZTS is very intuitive. Doesn’t it mean that every global variable needs to be owned in every thread? A copy? Then I will provide this mechanism: In a multi-threaded environment, applying for global variables is no longer a simple declaration of a variable, but the entire process allocates a memory space on the heap as a "thread global variable pool", which is initialized when the process starts. In this memory pool, whenever a thread needs to apply for a global variable, it calls TSRM (Thread Safe Resource Manager, the specific implementation of ZTS) through the corresponding method and passes the necessary parameters (such as variable size, etc.). TSRM is responsible for allocating it in the memory pool. The corresponding memory block and the reference ID of this memory are returned, so that next time the thread needs to read or write this variable, it can pass the unique reference ID to TSRM, and TSRM will be responsible for the actual read and write operations. This achieves thread-safe global variables. The following figure gives a schematic diagram of the ZTS principle:
Thread1 and Thread2 belong to the same process, each of which requires a global variable Global Var. TSRM allocates one to each of them in the thread global memory pool (yellow part) area and identified by a unique ID, so that two threads can access their own variables through TSRM without interfering with each other. Let's take a look at how Zend implements this mechanism through specific code snippets. Here I am using the source code of PHP5.3.8. The implementation code of TSRM is in the "TSRM" directory of the PHP source code.
Data structure
There are two important data structures in TSRM: tsrm_tls_entry and tsrm_resource_type. Let’s look at tsrm_tls_entry first. tsrm_tls_entry is defined in TSRM/TSRM.c:
Copy code The code is as follows:
typedef struct _tsrm_tls_entry tsrm_tls_entry;
struct _tsrm_tls_entry {
void **storage;
int count;
THREAD_T thread_id;
tsrm_tls_entry *next;
}
Each tsrm_tls_entry structure is responsible for representing all global variable resources of a thread, where thread_id stores the thread ID, count records the number of global variables, and next points to the next node. Storage can be viewed as an array of pointers, where each element is a global variable pointing to the thread represented by this node. Finally, the tsrm_tls_entry of each thread is formed into a linked list structure, and the linked list head pointer is assigned to a global static variable tsrm_tls_table. Note that because tsrm_tls_table is a real global variable, all threads will share this variable, which achieves memory management consistency between threads. The schematic diagram of the tsrm_tls_entry and tsrm_tls_table structures is as follows:
The internal structure of tsrm_resource_type is relatively simple:
Copy code The code is as follows:
typedef struct {
size_t size;
ts_allocate_ctor ctor;
ts_allocate_dtor dtor;
int done;
}
tsrm_resource_type; mentioned above tsrm_tls_entry is based on threads (one node per thread), and tsrm_resource_type is based on resources (or global variables). Every time a new resource is allocated, a tsrm_resource_type will be created. All tsrm_resource_types form a tsrm_resource_table in the form of an array (linear table), and its subscript is the ID of this resource. Each tsrm_resource_type stores the size and construction and destruction method pointers of this resource. To some extent, tsrm_resource_table can be regarded as a hash table, the key is the resource ID, and the value is the tsrm_resource_type structure.
Implementation details
This section analyzes the implementation details of some TSRM algorithms. Because the entire TSRM involves a lot of code, here are two representative functions for analysis. The first thing worth noting is the tsrm_startup function, which is called by sapi at the beginning of the process to initialize the TSRM environment. Since tsrm_startup is slightly long, here are the excerpts that I think should be noted:
Copy code The code is as follows:
/* Startup TSRM (call once for the entire process) */
TSRM_API int tsrm_startup(int expected_threads, int expected_resources, int debug_level, char *debug_filename)
{
/* code... */
tsrm_tls_table_size = expected_threads;
tsrm_tls_table = (tsrm_tls_entry **) calloc(tsrm_tls_table_size, sizeof(tsrm_tls_entry *));
if (!tsrm_tls_table) {
TSRM_ERROR((TSRM_ ERROR_LEVEL_ERROR, "Unable to allocate TLS table"));
return 0;
}
id_count=0;
resource_types_table_size = expected_resources;
resource_types_table = (tsrm_resource_type *) calloc(resource_types_table_size, sizeof(tsrm_resource_type ));
if (!resource_types_table) {
TSRM_ERROR((TSRM_ERROR_LEVEL_ERROR, "Unable to allocate resource types table"));
free(tsrm_tls_table);
tsrm_tls_table = NULL;
return 0;
}
/* code... */
return 1;
}
In fact, the main task of tsrm_startup is to initialize the two data structures mentioned above. The first interesting thing is its first two parameters: expected_threads and expected_resources. These two parameters are passed in by sapi, indicating the expected number of threads and resources. You can see that tsrm_startup will pre-allocate space (through calloc) according to these two parameters. Therefore, TSRM will first allocate resources that can accommodate expected_threads threads and expected_resources resources. To see what each sapi will pass in by default, you can look at the source code of each sapi (in the sapi directory). I took a brief look:
You can see the more commonly used sapi such as mod_php5, php-fpm and cgi They all pre-allocate one thread and one resource because they don't want to waste memory space, and in most cases PHP still runs in a single-threaded environment. You can also see an id_count variable here. This variable is a global static variable. Its function is to generate resource IDs through auto-increment. This variable is initialized to 0 here. Therefore, the way TSRM generates resource IDs is very simple: it is the auto-increment of an integer variable. The second thing that needs careful analysis is ts_allocate_id. Friends who have written PHP extensions must be familiar with this function. This function...
http://www.bkjia.com/PHPjc/324589.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/324589.htmlTechArticleIt’s always uncomfortable not knowing what’s going on, so I briefly read the source code and consulted limited information. To understand the relevant mechanisms, this article is a summary of my research content. This article first...