Detailed graphic code explanation of PHP kernel storage mechanism (separation/change)

黄舟
Release: 2023-03-06 13:06:02
Original
1298 people have browsed it

Foreword:

Most programmers who read blogs may not like to read articles with a lot of Chinese characters, but this article does introduce descriptions based on Chinese characters. , if you read it patiently, it will definitely be rewarding for most people!

#Maybe you know, maybe you don’t know, PHP is a weakly typed, dynamic scripting language. The so-called weak type means that PHP does not strictly verify variable types (strictly speaking, PHP is a medium-strongly typed language). When declaring a variable, it does not need to explicitly indicate the type of data it saves. For example: $a = 1; (Integer) $a ="1";(String)

## I have always used PHP, but what exactly is it and how is the bottom layer implemented to make PHP such a convenient and fast weakly typed language?

I have recently read a lot of books and related blog information, and learned a lot about the mechanisms of the PHP kernel.
The simple understanding of php is a C language class library. If you go to php.net and download its source code, you will find that first of all, the core of php is the zend engine. , it is a function library written in C language, used to handle underlying function management, memory management, class management, and variable management. On the kernel, they wrote many extensions, most of which are independent. To use an operating system metaphor, zend Engine is an operating system, and it officially provides many "applications", but this "application" is not media play but mysql, libxml, and dom. Of course, you can also develop your own extensions based on the zend engine's API.


Let’s start with the introduction of the storage mechanism of PHP variables in the kernel.

PHP is a typed language, which means that a PHP variable can save any data type. But PHP is written in C language, and C language is a strongly typed language, right? Each variable will have a fixed type (one is converted through strong type, but there may be problems), so how to do it in the Zend engine? to save any data type into a variable? Please see its storage structure below.

Open the Zend/zend.h header file and you will find the following structureZval


##1.zval structure

 typedef struct _zval_struct zval;
Copy after login
rrree

 typedef union _zvalue_value {
    long lval;      /* long value */
    double dval;    /* double value */
    struct {
    char *val; //4字节
    int len;   //4字节
    } str;
    HashTable *ht;    /* hash table value */
    zend_object_value obj;
 } zvalue_value;
Copy after login


2.

zend_uchar type

Variables in PHP include four scalar types (bool, int, float, string), two A composite type(array, object) and two special types(resource and NULL). Within zend, these types correspond to the following macros (code location phpsrc/Zend/zend.h) Zend determines which member of value to access based on the type value. The available values ​​are as follows:


#3
.
zend_uint refcount__gc

This value is actually a counter used to save how many variables (or symbols, symbols, all symbols exist in the symbol table (symble table), different scopes use different symbol tables, we will discuss this later) points to the zval. When a variable is generated, its refcount=1. Typical assignment operations such as $a = $b will increase the refcount of zval by 1, and the unset operation will decrease it by 1 accordingly. Before PHP5.3, the reference counting mechanism was used to implement GC. If the refcount of a zval was less than 0, then the Zend engine would think that there was no variable pointing to the zval, so it would release the memory space occupied by the zval. But, sometimes things are not that simple. We will see later that The simple reference counting mechanism cannot GC the zval referenced by the cycle (see example 3 below for details), even if the variable points to the zval has been unset, resulting in a memory leak (Memory Leak).

4.is_ref__gc.

This field is used to mark whether the variable is a reference variable. For ordinary variables, the value is 0, and for reference variables, the value is 1. This variable will affect the sharing, separation, etc. of zval. We will discuss this later.

As the name suggests, ref_count__gc and

is_ref__gc are two very important fields required by PHP’s GC mechanism. The values ​​of these two fields can be debugged through xdebug, etc. View tools.

Let’s focus on zval to describe what kind of storage mechanism PHP variables are.

I have also introduced the installation of Xdebug in the previous PHPstorm Xdebug debugging. I will not go into details here. Please see: phpstorm+Xdebug breakpoint debugging PHP

安装成功后,你的脚本中,可以通过xdebug_debug_zval打印Zval的信息,用法:

 $var = 1;
 debug_zval_dump($var);
 $var_dup = $var;
 debug_zval_dump($var);
Copy after login

实例一:

    $a = 1;
    $b = $a;
    $c = $b;
    $d = &$c; // 在一堆非引用赋值中,插入一个引用
Copy after login

整个过程图示如下:

---------------------------------------------------------

实例二:

   $a = 1;
    $b = &$a;
    $c = &$b;
    $d = $c; // 在一堆引用赋值中,插入一个非引用
Copy after login

整个过程图示如下:



通过实例一、二,展现了,这就是PHP的copy on write写时分离机制change on write写时改变机制

过程:

PHP在修改一个变量以前,会首先查看这个变量的refcount,如果refcount大于1,PHP就会执行一个分离的例程,

对于上面的实例一代码,当执行到第四行的时候,PHP发现$c指向的zval的refcount大于1,那么PHP就会复制一个新的zval出来,将原zval的refcount减1,并修改symbol_table,使得$a,$b和$c分离(Separation)。这个机制就是所谓的copy on write(写时复制/写时分离)。把$d指向的新zval的is_ref的值 == 1 ,这个机制叫做change on write(写时改变)

结论:

分离指的是:分离两个变量存储的zval的位置,让分开不指向同一个空间! (那如何判定是否要分离呢,依据是什么?见下边)

改变指的是,有&引用赋值时,要把新开辟的zval 的 is_ref 赋值为1


判定是否分离的条件:如果is_ref =1 或recount == 1,则不分离

if((*val)->is_ref || (*val)->refcount<2){
          //不执行Separation
        ... ;//process
  }
Copy after login

---------------------------------------------------------------------------------------------------

实例三:(内存是如何泄漏的)

数组变量与普通变量生成的zval非常类似,但也有很大不同

举例:


$a = $array(&#39;one&#39;);  
$a[] = &$a;  
xdebug_debug_zval(&#39;a&#39;);
Copy after login


debug_zval_dump打印出zval的结构是:

a: (refcount=2, is_ref=1)=array (
    0 => (refcount=1, is_ref=0)=&#39;one&#39;, 
    1 => (refcount=2, is_ref=1)=...
)
Copy after login


上述输出中,…表示指向原始数组,因而这是一个循环的引用。如下图所示:




现在,我们对$a执行unset操作,这会在symbol table中删除相应的symbol,同时,zval的refcount减1(之前为2),也就是说,现在的zval应该是这样的结构:

unset($a);
(refcount=1, is_ref=1)=array (
    0 => (refcount=1, is_ref=0)=&#39;one&#39;, 
    1 => (refcount=1, is_ref=1)=...
)
Copy after login


(应该ref_count=1)


(unset,其实就是打断$a在 符号表(symble table) 与zval 的一个指针映射关系。)


这时,不幸的事情发生了!

  Unset之后,虽然没有变量指向该zval,但是该zval却不能被GC(指PHP5.3之前的单纯引用计数机制的GC)清理掉,$a 被释放,但是$a里的$a[1]也指向了该zval,它没有被释放,导致zval的refcount均大于0。这样,这些zval实际上会一直存在内存中,直到请求结束(参考SAPI的生命周期)。在此之前,这些zval占据的内存不能被使用,便白白浪费了,换句话说,无法释放的内存导致了内存泄露

If this kind of memory leak occurs only once or a few times, it is not bad, but if it is thousands of memory leaks, it is a big problem. Especially in long-running scripts (such as daemons, which always execute in the background without interruption), since the memory cannot be recycled, the system will eventually "no longer have memory available", so this operation must be avoided.

Garbage collection mechanism:

1.php originally implements memory recycling through reference counters, that is, multiple php variables may reference the same memory. In this case, unset Losing one of them will not release the memory;
For example: $a = 1; $b = $a; unset($a);//The memory opened by $a will not be recycled
2. After leaving the scope of the variable, the memory occupied by the variable will be Automatically cleaned by (does not include static variables, static variables are created when the script is loaded and released when the script ends),

For example, if you unset local variables within a function or method, the memory will not be reduced when viewed outside the function.

3. There is a flaw in reference counting, that is, when a circular reference occurs, the counter cannot be cleared to 0, and the memory usage will continue until the end of the page access .

For this problem, a garbage collection mechanism has been added in PHP5.3. For details, you can check the documentation: //m.sbmmt.com/

The garbage collection mechanism was first proposed in Lisp. More about garbage collection information.


The above is the detailed content of Detailed graphic code explanation of PHP kernel storage mechanism (separation/change). For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!