Several difficult points in memory management of PHP principles

PHP’s memory management is divided into two parts. The first part is PHP’s own memory management. The main content of this part is reference counting, copy-on-write, and other application-oriented management. The second part is what I want to introduce today. The memory management of PHP itself described in zend_alloc, including how it manages available memory, how to allocate memory, etc.

In addition, why should I write this, because it has not been done before There is no information to introduce the strategies, data structures, or algorithms used in PHP memory management. But when we usually develop extensions and fix PHP bugs, we need to have a good understanding of this part of the knowledge. Within the PHP development team Many of my friends are not very clear about this, so I think it is necessary to write about it.

I won’t go into details about some basic concepts, because it is easy to understand by looking at the code, so I will focus here Let me introduce a few points that are not easy to understand when looking at the code. Why do I say this? Haha, before writing the article, I searched for existing information to avoid duplication of work. I saw the description of this part of the TIPI project. I found a lot of errors in it. So, I think this part is not easy to understand even if you look at the code

Currently, the English version of the introduction is also being written: Zend MM

Zend Memory Manager, hereafter referred to as Zend MM, is the logic of memory management in PHP. There is a key data structure: zend_mm_heap:

Zend MM divides memory into two types: small memory and large memory, and treats them differently. For small memory, This part is the most commonly used, so high performance is pursued. For large blocks of memory, stability is pursued and memory waste is avoided as much as possible.

Therefore, for small blocks of memory, PHP also introduces a cache mechanism:

Zend MM Hope Try to do it through cache, and the allocation can be found in one location.

One point that is not easy to understand is the declaration of free_buckets:

Q: Why is the length of the free_buckets array ZEND_MM_NUMBER_BUCKET?

A: This is because PHP uses a trick here, using a fixed-length array to store ZEND_MM_NUMBER_BUCKET zend_mm_free_blocks, as shown in the red box in the picture above. For an element of free_buckets that is not used, The only useful data structures are next_free_block and prev_free_block. Therefore, in order to save memory, PHP does not allocate ZEND_MM_NUMBER_BUCKET * sizeof(zend_mm_free_block) size of memory, but only uses ZEND_MM_NUMBER_BUCKET * (sizeof(*next_free_block) + sizeof(*prev_free_block)) size. memory..

Let’s look at the definition of the ZEND_MM_SMALL_FREE_BUCKET macro:

<ol class="dp-xml">
<li class="alt"><span><span>#define ZEND_MM_SMALL_FREE_BUCKET(heap, index)   </span></span></li>
<li>
<span>    (zend_mm_free_block*) ((char*)&heap-</span><span class="tag">></span><span>free_buckets[index * 2] +   </span>
</li>
<li class="alt"><span>        sizeof(zend_mm_free_block*) * 2 -   </span></li>
<li><span>        sizeof(zend_mm_small_free_block)) </span></li>
</ol>

Copy after login

After that, Zend MM guarantees that it will only use the prev and next pointers, so it will not cause memory read errors. .

Then, the second point that is not easy to understand is PHP’s management of large_free_buckets. Let’s first introduce the allocation (the TIPI project team’s description of this part is somewhat vague):

<ol class="dp-xml"><li class="alt"><span><span>static zend_mm_free_block *zend_mm_search_large_block(zend_mm_heap *heap, size_t true_size) </span></span></li></ol>

Copy after login

large_free_buckets can be said to be a combination of tree building and two-way list:

large_free_buckets uses a macro to determine what index a memory of a certain size falls on:

<ol class="dp-c"><li class="alt"><span><span>#define ZEND_MM_LARGE_BUCKET_INDEX(S) zend_mm_high_bit(S) </span></span></li></ol>

Copy after login

zend_mm_high_bit gets the serial number of the highest 1 in true_size (zend_mm_high_bit ), the corresponding assembly instruction is bsr (here, the error description of the TIPI project is: "This hash function is used to calculate the number of digits in size, and the return value is the number of 1's in the binary code of size - 1").

In other words, each element in large_free_buckets maintains a pointer to a memory block with a size of 1 at the corresponding index. Oh, that’s a bit convoluted, for example:

For example, large_free_buckets[2] will only save memory with a size of 0b1000 to 0b1111. Another example: large_free_buckets[6] will save pointers of memory with a size of 0b10000000 to 0b11111111.

In this way, when re-allocating memory, Zend MM can quickly locate the most suitable area for search. Improve performance.

And, each element is also a two-way list, maintaining the same size memory block, and the left and right children (child[0] and child[1]) represent the key values 0 and 1 respectively. What does this key value refer to?

Let’s give an example. For example, I asked PHP applied for a memory with true_size of 0b11010. After some steps, no suitable memory was found. PHP entered the zend_mm_search_large_block logic to find suitable memory in large_free_buckets:

1. First, calculate the corresponding true_size index, the calculation method is as described before ZEND_MM_LARGE_BUCKET_INDEX

2. 然后在一个位图结构中, 判断是否存在一个大于true_size的可用内存已经存在于large_free_buckets, 如果不存在就返回:

<ol class="dp-c">
<li class="alt"><span><span>size_t bitmap = heap->large_free_bitmap >> index;  </span></span></li>
<li>
<span class="keyword">if</span><span> (bitmap == 0) {  </span>
</li>
<li class="alt">
<span>   </span><span class="keyword">return</span><span> NULL;  </span>
</li>
<li><span>} </span></li>
</ol>

Copy after login

3. 判断, free_buckets[index]是否存在可用的内存:

<ol class="dp-c"><li class="alt"><span><span class="keyword">if</span><span> (UNEXPECTED((bitmap & 1) != 0)) </span></span></li></ol>

Copy after login

4. 如果存在, 则从free_buckets[index]开始, 寻找最合适的内存, 步骤如下:

4.1. 从free_buckets[index]开始, 如果free_buckets[index]当前的内存大小和true_size相等, 则寻找结束, 成功返回.

4.2. 查看true_size对应index后(true_size << (ZEND_MM_NUM_BUCKETS - index))的当前最高位, 如果为1. 则在free_buckets[index]->child[1]下面继续寻找, 如果free_buckets[index]->child[1]不存在, 则跳出. 如果true_size的当前最高位为0, 则在free_buckets[index]->child[0]下面继续寻找, 如果free_buckets[index]->child[0]不存在, 则在free_buckets[index]->child[1]下面寻找最小内存(因为此时可以保证, 在free_buckets[index]->child[1]下面的内存都是大于true_size的)

4.3. 出发点变更为2中所述的child, 左移一位ture_size.

5. 如果上述逻辑并没有找到合适的内存, 则寻找最小的”大块内存”:

<ol class="dp-c">
<li class="alt"><span><span class="comment">/* Search for smallest "large" free block */</span><span> </span></span></li>
<li><span>  best_fit = p = heap->large_free_buckets[index + zend_mm_low_bit(bitmap)];  </span></li>
<li class="alt">
<span>  </span><span class="keyword">while</span><span> ((p = p->child[p->child[0] != NULL])) {  </span>
</li>
<li>
<span>      </span><span class="keyword">if</span><span> (ZEND_MM_FREE_BLOCK_SIZE(p) < ZEND_MM_FREE_BLOCK_SIZE(best_fit)) {  </span></li><li class="alt"><span>          best_fit = p;  </span></li><li><span>      }  </span></li><li class="alt"><span>  } </span></li></ol>

Copy after login

注意上面的逻辑, (p = p->child[p->child[0] != NULL]), PHP在尽量寻找最小的内存.

为什么说, large_free_buckets是个键树呢, 从上面的逻辑我们可以看出, PHP把一个size, 按照二进制的0,1做键, 把内存大小信息反应到了键树上, 方便了快速查找.

另外, 还有一个rest_buckets, 这个结构是个双向列表, 用来保存一些PHP分配后剩下的内存, 避免无意义的把剩余内存插入free_buckets带来的性能问题(此处, TIPI项目错误的描述为: “这是一个只有两个元素的数组。而我们常用的插入和查找操作是针对第一个元素，即heap->rest_buckets[0]“).

作者: Laruence( ) 原文地址:

Several difficult points in memory management of PHP principles_PHP tutorial