作者:nowamagic
链接:http://www.nowamagic.net/librarys/veda/detail/102
PHP is said to be simple, but it is not easy to master it. In addition to being able to use it, we also need to know its underlying working principle.
PHP is a dynamic language suitable for web development. To be more specific, it is a software framework that uses C language to implement a large number of components. Looking at it in a more narrow sense, it can be considered as a powerful UI framework.
What is the purpose of understanding the underlying implementation of PHP? To use dynamic language well, we must first understand it. Memory management and framework models are worthy of our reference. Through extended development, we can achieve more powerful functions and optimize the performance of our programs.
1. PHP design concept and features
Multi-process model: Since PHP is a multi-process model, different requests do not interfere with each other. This ensures that the failure of one request will not affect the entire service. Of course, with the development of the times, PHP has already supported the multi-thread model. .
Weakly typed language: Unlike C/C, Java, C# and other languages, PHP is a weakly typed language. The type of a variable is not determined at the beginning. It is determined during operation and implicit or explicit type conversion may occur. The flexibility of this mechanism is very convenient and efficient in web development. The details will be discussed in PHP later. Variables are detailed in.
The engine (Zend) component (ext) mode reduces internal coupling.
The middle layer (sapi) isolates the web server and PHP.
The syntax is simple and flexible, without too many specifications. Shortcomings lead to mixed styles, but no matter how bad a programmer is, he will not write a program that is too outrageous and endangers the overall situation.
2. PHP’s four-layer system
The core architecture of PHP is as shown below:
As can be seen from the picture, PHP is a 4-layer system from bottom to top:
Zend engine: Zend is implemented entirely in pure C and is the core part of PHP. It translates PHP code (a series of compilation processes such as lexical and syntax analysis) into executable opcode processing and implements corresponding processing methods. Basic data structures (such as hashtable, oo), memory allocation and management, and providing corresponding api methods for external calls are the core of everything. All peripheral functions are implemented around Zend.
Extensions: Centering around the Zend engine, extensions provide various basic services in a component-based manner. Our common various built-in functions (such as array series), standard libraries, etc. are all implemented through extensions. Users can also use extensions as needed. Implement your own extension to achieve function expansion, performance optimization and other purposes (for example, the PHP middle layer and rich text parsing currently used by Tieba are typical applications of extension).
Sapi: The full name of Sapi is Server Application Programming Interface, which is the server application programming interface. Sapi uses a series of hook functions to enable PHP to interact with peripheral data. This is a very elegant and successful design of PHP. It is successful through sapi By decoupling and isolating PHP itself from upper-layer applications, PHP can no longer consider how to be compatible with different applications, and the applications themselves can also implement different processing methods based on their own characteristics.
Upper-layer application: This is the PHP program we usually write. We can obtain various application modes through different sapi methods, such as implementing web applications through webserver, running them in script mode on the command line, etc.
If PHP is a car, then the frame of the car is PHP itself, Zend is the engine of the car, and the various components under Ext are the wheels of the car. Sapi can be regarded as a road, and the car can run on different roads. type of highway, and the execution of a PHP program means the car is running on the highway. So we need: a good engine, the right wheels, the right track.
3. Sapi
As mentioned before, Sapi allows external applications to exchange data with PHP through a series of interfaces and implement specific processing methods according to different application characteristics. Some of our common sapis are:
apache2handler: This is the processing method when using apache as the webserver and running in mod_PHP mode. It is also the most widely used one now.
cgi: This is another direct interaction method between webserver and PHP, which is the famous fastcgi protocol. In recent years, fastcgi PHP has been used more and more, and it is also the only method supported by asynchronous webserver.
cli: application mode called by command line
4. PHP execution process&opcode
Let’s first take a look at the process of executing PHP code.
从图上可以看到,PHP实现了一个典型的动态语言执行过程:拿到一段代码后,经过词法解析、语法解析等阶段后,源程序会被翻译成一个个指令(opcodes),然后ZEND虚拟机顺次执行这些指令完成操作。PHP本身是用C实现的,因此最终调用的也都是C的函数,实际上,我们可以把PHP看做是一个C开发的软件。
PHP的执行的核心是翻译出来的一条一条指令,也即opcode。
Opcode是PHP程序执行的最基本单位。一个opcode由两个参数(op1,op2)、返回值和处理函数组成。PHP程序最终被翻译为一组opcode处理函数的顺序执行。
常见的几个处理函数:
<p>ZEND_ASSIGN_SPEC_CV_CV_HANDLER : 变量分配 ($a=$b)</p> <p>ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER:函数调用</p> <p>ZEND_CONCAT_SPEC_CV_CV_HANDLER:字符串拼接 $a.$b</p> <p>ZEND_ADD_SPEC_CV_CONST_HANDLER: 加法运算 $a+2</p> <p>ZEND_IS_EQUAL_SPEC_CV_CONST:判断相等 $a==1</p> <p>ZEND_IS_IDENTICAL_SPEC_CV_CONST:判断相等 $a===1</p>
5. HashTable — 核心数据结构
HashTable是zend的核心数据结构,在PHP里面几乎并用来实现所有常见功能,我们知道的PHP数组即是其典型应用,此外,在zend内部,如函数符号表、全局变量等也都是基于hash table来实现。
PHP的hash table具有如下特点:
支持典型的key->value查询
可以当做数组使用
添加、删除节点是O(1)复杂度
key支持混合类型:同时存在关联数组合索引数组
Value支持混合类型:array (“string”,2332)
支持线性遍历:如foreach
Zend hash table实现了典型的hash表散列结构,同时通过附加一个双向链表,提供了正向、反向遍历数组的功能。其结构如下图:
可以看到,在hash table中既有key->value形式的散列结构,也有双向链表模式,使得它能够非常方便的支持快速查找和线性遍历。
散列结构:Zend的散列结构是典型的hash表模型,通过链表的方式来解决冲突。需要注意的是zend的hash table是一个自增长的数据结构,当hash表数目满了之后,其本身会动态以2倍的方式扩容并重新元素位置。初始大小均为8。另外,在进行key->value快速查找时候,zend本身还做了一些优化,通过空间换时间的方式加快速度。比如在每个元素中都会用一个变量nKeyLength标识key的长度以作快速判定。
双向链表:Zend hash table通过一个链表结构,实现了元素的线性遍历。理论上,做遍历使用单向链表就够了,之所以使用双向链表,主要目的是为了快速删除,避免遍历。Zend hash table是一种复合型的结构,作为数组使用时,即支持常见的关联数组也能够作为顺序索引数字来使用,甚至允许2者的混合。
PHP关联数组:关联数组是典型的hash_table应用。一次查询过程经过如下几步(从代码可以看出,这是一个常见的hash查询过程并增加一些快速判定加速查找。):
<p>getKeyHashValue h;</p> <p>index = n & nTableMask;</p> <p>Bucket *p = arBucket[index];</p> <p>while (p) {</p> <p> if ((p->h == h) & (p->nKeyLength == nKeyLength)) {</p> <p> RETURN p->data; </p> <p> }</p> <p> p=p->next;</p> <p>}</p>
PHP索引数组:索引数组就是我们常见的数组,通过下标访问。例如 $arr[0],Zend HashTable内部进行了归一化处理,对于index类型key同样分配了hash值和nKeyLength(为0)。内部成员变量nNextFreeElement就是当前分配到的最大id,每次push后自动加一。正是这种归一化处理,PHP才能够实现关联和非关联的混合。由于push操作的特殊性,索引key在PHP数组中先后顺序并不是通过下标大小来决定,而是由push的先后决定。例如 $arr[1] = 2; $arr[2] = 3;对于double类型的key,Zend HashTable会将他当做索引key处理
6. PHP变量
PHP是一门弱类型语言,本身不严格区分变量的类型。PHP在变量申明的时候不需要指定类型。PHP在程序运行期间可能进行变量类型的隐示转换。和其他强类型语言一样,程序中也可以进行显示的类型转换。PHP变量可以分为简单类型(int、string、bool)、集合类型(array resource object)和常量(const)。以上所有的变量在底层都是同一种结构 zval。
Zval是zend中另一个非常重要的数据结构,用来标识并实现PHP变量,其数据结构如下:
Zval主要由三部分组成:
type:指定了变量所述的类型(整数、字符串、数组等)
refcount&is_ref:用来实现引用计数(后面具体介绍)
value:核心部分,存储了变量的实际数据
Zvalue是用来保存一个变量的实际数据。因为要存储多种类型,所以zvalue是一个union,也由此实现了弱类型。
PHP变量类型和其实际存储对应关系如下:
<p>IS_LONG -> lvalue</p> <p>IS_DOUBLE -> dvalue</p> <p>IS_ARRAY -> ht</p> <p>IS_STRING -> str</p> <p>IS_RESOURCE -> lvalue</p>
引用计数在内存回收、字符串操作等地方使用非常广泛。PHP中的变量就是引用计数的典型应用。Zval的引用计数通过成员变量is_ref和ref_count实现,通过引用计数,多个变量可以共享同一份数据。避免频繁拷贝带来的大量消耗。
在进行赋值操作时,zend将变量指向相同的zval同时ref_count++,在unset操作时,对应的ref_count-1。只有ref_count减为0时才会真正执行销毁操作。如果是引用赋值,则zend会修改is_ref为1。
PHP变量通过引用计数实现变量共享数据,那如果改变其中一个变量值呢?当试图写入一个变量时,Zend若发现该变量指向的zval被多个变量共享,则为其复制一份ref_count为1的zval,并递减原zval的refcount,这个过程称为“zval分离”。可见,只有在有写操作发生时zend才进行拷贝操作,因此也叫copy-on-write(写时拷贝)
对于引用型变量,其要求和非引用型相反,引用赋值的变量间必须是捆绑的,修改一个变量就修改了所有捆绑变量。
整数、浮点数是PHP中的基础类型之一,也是一个简单型变量。对于整数和浮点数,在zvalue中直接存储对应的值。其类型分别是long和double。
从zvalue结构中可以看出,对于整数类型,和c等强类型语言不同,PHP是不区分int、unsigned int、long、long long等类型的,对它来说,整数只有一种类型也就是long。由此,可以看出,在PHP里面,整数的取值范围是由编译器位数来决定而不是固定不变的。
对于浮点数,类似整数,它也不区分float和double而是统一只有double一种类型。
在PHP中,如果整数范围越界了怎么办?这种情况下会自动转换为double类型,这个一定要小心,很多trick都是由此产生。
和整数一样,字符变量也是PHP中的基础类型和简单型变量。通过zvalue结构可以看出,在PHP中,字符串是由由指向实际数据的指针和长度结构体组成,这点和c++中的string比较类似。由于通过一个实际变量表示长度,和c不同,它的字符串可以是2进制数据(包含),同时在PHP中,求字符串长度strlen是O(1)操作。
在新增、修改、追加字符串操作时,PHP都会重新分配内存生成新的字符串。最后,出于安全考虑,PHP在生成一个字符串时末尾仍然会添加
常见的字符串拼接方式及速度比较:
假设有如下4个变量:$strA=‘123’; $strB = ‘456’; $intA=123; intB=456;
现在对如下的几种字符串拼接方式做一个比较和说明:
<p>$res = $strA.$strB和$res = “$strA$strB”</p> <p>这种情况下,zend会重新malloc一块内存并进行相应处理,其速度一般</p> <p>$strA = $strA.$strB</p> <p>这种是速度最快的,zend会在当前strA基础上直接relloc,避免重复拷贝</p> <p>$res = $intA.$intB</p> <p>这种速度较慢,因为需要做隐式的格式转换,实际编写程序中也应该注意尽量避免</p> <p>$strA = sprintf (“%s%s”,$strA.$strB);</p>
这会是最慢的一种方式,因为sprintf在PHP中并不是一个语言结构,本身对于格式识别和处理就需要耗费比较多时间,另外本身机制也是malloc。不过sprintf的方式最具可读性,实际中可以根据具体情况灵活选择。
PHP的数组通过Zend HashTable来天然实现。
foreach操作如何实现?对一个数组的foreach就是通过遍历hashtable中的双向链表完成。对于索引数组,通过foreach遍历效率比for高很多,省去了key->value的查找。count操作直接调用HashTable->NumOfElements,O(1)操作。对于’123’这样的字符串,zend会转换为其整数形式。$arr[‘123’]和$arr[123]是等价的
资源类型变量是PHP中最复杂的一种变量,也是一种复合型结构。
PHP的zval可以表示广泛的数据类型,但是对于自定义的数据类型却很难充分描述。由于没有有效的方式描绘这些复合结构,因此也没有办法对它们使用传统的操作符。要解决这个问题,只需要通过一个本质上任意的标识符(label)引用指针,这种方式被称为资源。
在zval中,对于resource,lval作为指针来使用,直接指向资源所在的地址。Resource可以是任意的复合结构,我们熟悉的mysqli、fsock、memcached等都是资源。
如何使用资源:
注册:对于一个自定义的数据类型,要想将它作为资源。首先需要进行注册,zend会为它分配全局唯一标示。
获取一个资源变量:对于资源,zend维护了一个id->实际数据的hash_tale。对于一个resource,在zval中只记录了它的id。fetch的时候通过id在hash_table中找到具体的值返回。
Resource destruction: The data types of resources are diverse. Zend itself has no way to destroy it. Therefore, users need to provide a destruction function when registering resources. When unset resources, zend calls the corresponding function to complete the destruction. Also delete it from the global resource table.
Resources can persist long-term, not just after all variables referencing it go out of scope, but even after a request ends and a new request is made. These resources are called persistent resources because they persist throughout the life cycle of the SAPI unless specifically destroyed. In many cases, persistent resources can improve performance to a certain extent. For example, in our common mysql_pconnect, persistent resources allocate memory through pemalloc so that they will not be released when the request ends.
For zend, there is no distinction between the two.
How are local variables and global variables implemented in PHP? For a request, PHP can see two symbol tables (symbol_table and active_symbol_table) at any time, with the former used to maintain global variables. The latter is a pointer pointing to the currently active variable symbol table. When the program enters a function, zend will allocate a symbol table x to it and point active_symbol_table to a. In this way, the distinction between global and local variables is achieved.
Get variable values: PHP's symbol table is implemented through hash_table. Each variable is assigned a unique identifier. When obtaining, the corresponding zval is found from the table according to the identifier and returned.
Using global variables in functions: In functions, we can use global variables by explicitly declaring global. Create a reference to the variable with the same name in symbol_table in active_symbol_table. If there is no variable with the same name in symbol_table, it will be created first.