PHP is said to be simple, but it is not easy to master it. In addition to being able to use it, we also need to know its underlying working principle.
PHP is a dynamic language suitable for web development. To be more specific, it is a software framework that uses C language to implement a large number of components. Looking at it in a more narrow sense, it can be considered as a powerful UI framework.
What is the purpose of understanding the underlying implementation of PHP? To use dynamic language well, we must first understand it. Memory management and framework models are worthy of our reference. Through extended development, we can achieve more powerful functions and optimize the performance of our programs.
1. The design concept and characteristics of PHP
Multi-process model: Since PHP is a multi-process model, different requests do not interfere with each other, which ensures that the failure of one request will not affect the entire service. Of course, with the times, Development, PHP has already supported the multi-threading model.
Weakly typed language: Unlike C/C++, Java, C# and other languages, PHP is a weakly typed language. The type of a variable is not determined at the beginning. It is determined during operation and implicit or explicit type conversion may occur. The flexibility of this mechanism is very convenient and efficient in web development. The details will be discussed in PHP later. Variables are detailed in.
The engine (Zend) + component (ext) mode reduces internal coupling.
The middle layer (sapi) isolates the web server and PHP.
The syntax is simple and flexible, without too many specifications. Shortcomings lead to mixed styles, but no matter how bad a programmer is, he will not write a program that is too outrageous and endangers the overall situation.
2. PHP’s four-layer system
The core architecture of PHP is as shown below:
As can be seen from the picture, PHP is a 4-layer system from bottom to top:
Zend engine: Zend is used as a whole Pure C implementation is the core part of PHP. It translates PHP code (a series of compilation processes such as lexical and syntax analysis) into executable opcode processing and implements corresponding processing methods, and implements basic data structures (such as hashtable, oo ), memory allocation and management, and provides corresponding api methods for external calls. It is the core of everything. All peripheral functions are implemented around Zend.
Extensions: Around the Zend engine, extensions provide various basic services in a component-based manner. Our common various built-in functions (such as array series), standard libraries, etc. are all implemented through extensions, and users can also implement them as needed. Use your own extension to achieve function expansion, performance optimization and other purposes (for example, the PHP middle layer and rich text parsing currently used by Tieba are typical applications of extension).
Sapi: The full name of Sapi is Server Application Programming Interface, which is the server application programming interface. Sapi enables PHP to interact with peripheral data through a series of hook functions. This is a very elegant and successful design of PHP. It is successful through sapi By decoupling and isolating PHP itself from upper-layer applications, PHP can no longer consider how to be compatible with different applications, and the applications themselves can also implement different processing methods based on their own characteristics.
Upper-layer application: This is the PHP program we usually write. We can obtain various application modes through different sapi methods, such as implementing web applications through webserver, running them in script mode on the command line, etc.
If PHP is a car, then the frame of the car is PHP itself, Zend is the engine of the car, the various components under Ext are the wheels of the car, Sapi can be regarded as a road, and the car can run on different types of on the road, and the execution of a PHP program is the car running on the road. Therefore, we need: a high-performance engine + the right wheels + the right track.
3. Sapi
As mentioned before, Sapi allows external applications to exchange data with PHP through a series of interfaces and implement specific processing methods according to different application characteristics. Some of our common sapis are:
apache2handler : This is the processing method when using apache as the webserver and running in mod_PHP mode. It is also the most widely used one now.
cgi: This is another direct interaction method between webserver and PHP, which is the famous fastcgi protocol. In recent years, fastcgi+PHP has been used more and more, and it is also the only method supported by asynchronous webserver.
cli: Application mode for command line calls
4. PHP execution process & opcode
Let’s first take a look at the process of executing PHP code.
As you can see from the picture, PHP implements a typical dynamic language execution process: after getting a piece of code, after lexical analysis, syntax analysis and other stages, the source program will be translated into instructions one by one. (opcodes), and then the ZEND virtual machine executes these instructions in sequence to complete the operation. PHP itself is implemented in C, so the functions ultimately called are all C functions. In fact, we can regard PHP as a software developed in C.
The core of PHP execution is the translated instructions, which are opcodes.
Opcode is the most basic unit of PHP program execution. An opcode consists of two parameters (op1, op2), return value and processing function. The PHP program is ultimately translated into the sequential execution of a set of opcode processing functions.
Several common processing functions:
ZEND_ASSIGN_SPEC_CV_CV_HANDLER : 变量分配 ($a=$b) ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER:函数调用 ZEND_CONCAT_SPEC_CV_CV_HANDLER:字符串拼接 $a.$b ZEND_ADD_SPEC_CV_CONST_HANDLER: 加法运算 $a+2 ZEND_IS_EQUAL_SPEC_CV_CONST:判断相等 $a==1 ZEND_IS_IDENTICAL_SPEC_CV_CONST:判断相等 $a===1
5. HashTable - core data structure
HashTable is the core data structure of zend. It is used to implement almost all common functions in PHP. The PHP array we know is its typical application , In addition, within zend, such as function symbol table, global variables, etc. are also implemented based on hash table.
PHP’s hash table has the following features:
Supports typical key->value query
Can be used as an array
Add and delete nodes are O(1) complexity
key supports mixed types: there are associated numbers at the same time Combined index array
Value supports mixed types: array ("string", 2332)
supports linear traversal: such as foreach
Zend hash table implements a typical hash table hash structure, and at the same time provides a doubly linked list by appending it The function of traversing arrays in forward and reverse directions. Its structure is as shown below:
As you can see, there are both hash structures in the form of key->value and doubly linked list mode in the hash table, making it very convenient to support fast search and linear traversal.
Hash structure: Zend’s hash structure is a typical hash table model, which resolves conflicts through a linked list. It should be noted that zend's hash table is a self-growing data structure. When the hash table is full, it will dynamically expand by 2 times and reposition elements. The initial size is 8. In addition, when performing key->value fast search, zend itself has also made some optimizations to speed up the process by exchanging space for time. For example, a variable nKeyLength is used in each element to identify the length of the key for quick determination.
Double linked list: Zend hash table implements linear traversal of elements through a linked list structure. In theory, it is enough to use a one-way linked list for traversal. The main purpose of using a two-way linked list is to quickly delete and avoid traversal. Zend hash table is a composite structure. When used as an array, it supports common associative arrays and can also be used as sequential index numbers, and even allows a mixture of the two.
PHP associative array: Associative array is a typical hash_table application. A query process goes through the following steps (as can be seen from the code, this is a common hash query process and adds some quick judgments to speed up the search.):
getKeyHashValue h; index = n & nTableMask; Bucket *p = arBucket[index]; while (p) { if ((p->h == h) & (p->nKeyLength == nKeyLength)) { RETURN p->data; } p=p->next; }
PHP index array: The index array is our common array, through subscripts access. For example, $arr[0], Zend HashTable is internally normalized, and the hash value and nKeyLength (0) are also assigned to the index type key. The internal member variable nNextFreeElement is the currently assigned maximum id, which is automatically increased by one after each push. It is this normalization process that allows PHP to achieve a mixture of associative and non-associative data. Due to the particularity of the push operation, the order of the index keys in the PHP array is not determined by the size of the subscript, but by the order of the push. For example, $arr[1] = 2; $arr[2] = 3; for double type keys, Zend HashTable will treat them as index keys
6. PHP variables
PHP is a weakly typed language and does not Strictly distinguish the types of variables. PHP does not need to specify the type when declaring variables. PHP may perform implicit conversions of variable types during program execution. Like other strongly typed languages, explicit type conversion can also be performed in the program. PHP variables can be divided into simple types (int, string, bool), collection types (array resource object) and constants (const). All the above variables have the same structure zval under the hood.
Zval is another very important data structure in zend, used to identify and implement PHP variables. Its data structure is as follows:
Zval mainly consists of three parts:
type: specifies the variable described Type (integer, string, array, etc.)
refcount&is_ref: used to implement reference counting (detailed introduction later)
value: the core part, which stores the actual data of the variable
Zvalue is used to save the actual data of a variable . Because multiple types need to be stored, zvalue is a union, thus implementing weak typing.
The corresponding relationship between PHP variable types and their actual storage is as follows:
IS_LONG -> lvalue IS_DOUBLE -> dvalue IS_ARRAY -> ht IS_STRING -> str IS_RESOURCE -> lvalue
Reference counting is widely used in memory recycling, string operations, etc. Variables in PHP are a typical application of reference counting. Zval's reference counting is implemented through the member variables is_ref and ref_count. Through reference counting, multiple variables can share the same data. Avoid the huge consumption caused by frequent copying.
When performing an assignment operation, zend points the variable to the same zval and ref_count++, and during the unset operation, the corresponding ref_count-1. The destruction operation will only be performed when ref_count is reduced to 0. If it is a reference assignment, zend will modify is_ref to 1.
PHP变量通过引用计数实现变量共享数据,那如果改变其中一个变量值呢?当试图写入一个变量时,Zend若发现该变量指向的zval被多个变量共享,则为其复制一份ref_count为1的zval,并递减原zval的refcount,这个过程称为“zval分离”。可见,只有在有写操作发生时zend才进行拷贝操作,因此也叫copy-on-write(写时拷贝)
对于引用型变量,其要求和非引用型相反,引用赋值的变量间必须是捆绑的,修改一个变量就修改了所有捆绑变量。
整数、浮点数是PHP中的基础类型之一,也是一个简单型变量。对于整数和浮点数,在zvalue中直接存储对应的值。其类型分别是long和double。
从zvalue结构中可以看出,对于整数类型,和c等强类型语言不同,PHP是不区分int、unsigned int、long、long long等类型的,对它来说,整数只有一种类型也就是long。由此,可以看出,在PHP里面,整数的取值范围是由编译器位数来决定而不是固定不变的。
对于浮点数,类似整数,它也不区分float和double而是统一只有double一种类型。
在PHP中,如果整数范围越界了怎么办?这种情况下会自动转换为double类型,这个一定要小心,很多trick都是由此产生。
和整数一样,字符变量也是PHP中的基础类型和简单型变量。通过zvalue结构可以看出,在PHP中,字符串是由由指向实际数据的指针和长度结构体组成,这点和c++中的string比较类似。由于通过一个实际变量表示长度,和c不同,它的字符串可以是2进制数据(包含),同时在PHP中,求字符串长度strlen是O(1)操作。
在新增、修改、追加字符串操作时,PHP都会重新分配内存生成新的字符串。最后,出于安全考虑,PHP在生成一个字符串时末尾仍然会添加
常见的字符串拼接方式及速度比较:
假设有如下4个变量:$strA=‘123’; $strB = ‘456’; $intA=123; intB=456;
现在对如下的几种字符串拼接方式做一个比较和说明:
$res = $strA.$strB和$res = “$strA$strB” 这种情况下,zend会重新malloc一块内存并进行相应处理,其速度一般 $strA = $strA.$strB 这种是速度最快的,zend会在当前strA基础上直接relloc,避免重复拷贝 $res = $intA.$intB 这种速度较慢,因为需要做隐式的格式转换,实际编写程序中也应该注意尽量避免 $strA = sprintf (“%s%s”,$strA.$strB);
这会是最慢的一种方式,因为sprintf在PHP中并不是一个语言结构,本身对于格式识别和处理就需要耗费比较多时间,另外本身机制也是malloc。不过sprintf的方式最具可读性,实际中可以根据具体情况灵活选择。
PHP的数组通过Zend HashTable来天然实现。
foreach操作如何实现?对一个数组的foreach就是通过遍历hashtable中的双向链表完成。对于索引数组,通过foreach遍历效率比for高很多,省去了key->value的查找。count操作直接调用HashTable->NumOfElements,O(1)操作。对于’123’这样的字符串,zend会转换为其整数形式。$arr[‘123’]和$arr[123]是等价的
资源类型变量是PHP中最复杂的一种变量,也是一种复合型结构。
PHP的zval可以表示广泛的数据类型,但是对于自定义的数据类型却很难充分描述。由于没有有效的方式描绘这些复合结构,因此也没有办法对它们使用传统的操作符。要解决这个问题,只需要通过一个本质上任意的标识符(label)引用指针,这种方式被称为资源。
在zval中,对于resource,lval作为指针来使用,直接指向资源所在的地址。Resource可以是任意的复合结构,我们熟悉的mysqli、fsock、memcached等都是资源。
如何使用资源:
注册:对于一个自定义的数据类型,要想将它作为资源。首先需要进行注册,zend会为它分配全局唯一标示。
获取一个资源变量:对于资源,zend维护了一个id->实际数据的hash_tale。对于一个resource,在zval中只记录了它的id。fetch的时候通过id在hash_table中找到具体的值返回。
资源销毁:资源的数据类型是多种多样的。Zend本身没有办法销毁它。因此需要用户在注册资源的时候提供销毁函数。当unset资源时,zend调用相应的函数完成析构。同时从全局资源表中删除它。
资源可以长期驻留,不只是在所有引用它的变量超出作用域之后,甚至是在一个请求结束了并且新的请求产生之后。这些资源称为持久资源,因为它们贯通SAPI的整个生命周期持续存在,除非特意销毁。很多情况下,持久化资源可以在一定程度上提高性能。比如我们常见的mysql_pconnect ,持久化资源通过pemalloc分配内存,这样在请求结束的时候不会释放。
对zend来说,对两者本身并不区分。
How are local variables and global variables implemented in PHP? For a request, PHP can see two symbol tables (symbol_table and active_symbol_table) at any time, with the former used to maintain global variables. The latter is a pointer pointing to the currently active variable symbol table. When the program enters a function, zend will allocate a symbol table x to it and point active_symbol_table to a. In this way, the distinction between global and local variables is achieved.
Get variable values: PHP's symbol table is implemented through hash_table. Each variable is assigned a unique identifier. When obtaining, the corresponding zval is found from the table and returned according to the identifier.
Use global variables in functions: In functions, we can use global variables by explicitly declaring global. Create a reference to the variable with the same name in symbol_table in active_symbol_table. If there is no variable with the same name in symbol_table, it will be created first.