Home  >  Article  >  Backend Development  >  About the garbage collection mechanism of PHP5.3

About the garbage collection mechanism of PHP5.3

WBOY
WBOYOriginal
2016-07-25 09:10:16904browse
  1. struct _zval_struct {
  2. /* Variable information */
  3. zvalue_value value; /* value */
  4. zend_uint refcount__gc;
  5. zend_uchar type; /* active type */
  6. zend_uchar is_ref__gc;
  7. };
Copy code

Compared with versions before PHP5.3, the reference count field refcount and the reference field is_ref have __gc added after them for the new garbage collection mechanism. In PHP source code style, a large number of macros is a very distinctive feature. These macros are equivalent to an interface layer, which shields some underlying implementations below the interface layer, such as the ALLOC_ZVAL macro. Before PHP5.3, this macro directly called PHP's memory management allocation function emalloc to allocate memory. The allocated memory size is determined by The type and size of the variable are determined. After the garbage collection mechanism is introduced, the ALLOC_ZVAL macro directly adopts the new garbage collection unit structure. The allocated sizes are all the same, all of which are the memory size occupied by the zval_gc_info structure. After allocating the memory, the garbage collection of this structure is initialized. mechanism.

  1. /* The following macroses override macroses from zend_alloc.h */
  2. #undef ALLOC_ZVAL
  3. #define ALLOC_ZVAL(z)
  4. do {
  5. (z) = (zval*)emalloc(sizeof(zval_gc_info));
  6. GC_ZVAL_INIT(z);
  7. } while (0)
Copy code

zend_gc.h file is referenced at line 749 of zend.h: #include "zend_gc.h" thus replacing the referenced at line 237 ALLOC_ZVAL and other macros in the zend_alloc.h file. In the new macros, the key change is the change in the allocated memory size and allocation content. The content of the garbage collection mechanism is added to the previous pure memory allocation. All content All included in the zval_gc_info structure:

  1. typedef struct _zval_gc_info {
  2. zval z;
  3. union {
  4. gc_root_buffer *buffered;
  5. struct _zval_gc_info *next;
  6. } u;
  7. } zval_gc_info;
Copy code

For any ZVAL container storage The variable is allocated a zval structure, which ensures that it is aligned with the beginning of the memory allocated with the zval variable, so that it can be used as a zval when the zval_gc_info type pointer is cast. There is a union after the zval field: u. u includes the buffered field of the gc_root_buffer structure and the next field of the zval_gc_info structure. One of these two fields represents the root node cached by the garbage collection mechanism, and the other represents the next node in the zval_gc_info list. Whether the node cached by the garbage collection mechanism is used as a root node or a list node, it can be reflected here. . ALLOC_ZVAL will call GC_ZVAL_INIT after allocating memory to initialize zval_gc_info that replaces zval. It will set the buffered field of member u in zval_gc_info to NULL. This field will only have a value when it is put into the garbage collection buffer. , otherwise it will always be NULL. Since all variables in PHP exist in the form of zval variables, zval_gc_info is used here to replace zval, thereby successfully integrating the garbage collection mechanism in the original system. PHP's garbage collection mechanism is enabled by default in PHP5.3, but we can directly set it to disable through the configuration file. The corresponding configuration field is: zend.enable_gc. There is no this field in the php.ini file by default. If we need to disable this feature, add zend.enable_gc=0 or zend.enable_gc=off in php.ini. In addition to modifying the php.ini configuration zend.enable_gc, you can also turn on/off the garbage collection mechanism by calling the gc_enable()/gc_disable() function. The effect of calling these functions is the same as modifying the configuration item to turn on or off the garbage collection mechanism. In addition to these two functions, PHP provides the gc_collect_cycles() function to force cycle recycling when the root buffer is not full. There are some operations and fields related to whether the garbage collection mechanism is turned on in the PHP source code. There is the following code in the zend.c file:

  1. static ZEND_INI_MH(OnUpdateGCEnabled) /* {{{ */
  2. {
  3. OnUpdateBool(entry, new_value, new_value_length, mh_arg1, mh_arg2, mh_arg3, stage TSRMLS_CC);
  4. if (GC_G(gc_enabled )) {
  5. gc_init (TSRMLS_C);
  6. }
  7. return SUCCESS;
  8. }
  9. /* }}} */
  10. ZEND_INI_BEGIN()
  11. ZEND_INI_ENTRY("error_reporting", NULL, ZEND_INI_ALL, OnUpdateErrorReporting)
  12. STD_ZEND_INI_BOOLEAN("zend.enable_gc", "1", ZEND_INI_ALL, OnUpdateGCEnabled, gc_enabled, zend_gc_globals, gc_globals)
  13. #ifdef ZEND_MULTIBYTE
  14. STD_ZEND_INI_BOOLEAN("detect_unicode", "1", ZEND_INI_ALL, OnUpdateBool, detect_unicode, zend_compiler_globals, compiler_globals)
  15. #endif
  16. ZEND_INI_END()
Copy code

zend.enable_gc The corresponding operation function is ZEND_INI_MH (OnUpdateGCEnabled). If the garbage collection mechanism is turned on, that is, GC_G (gc_enabled) is true, the gc_init function will be called to perform the initialization operation of the garbage collection mechanism. The gc_init function is in line 121 of zend/zend_gc.c. This function will determine whether the garbage collection mechanism is turned on. If it is turned on, the entire mechanism will be initialized, that is, malloc will be directly called to allocate 10,000 gc_root_buffer memory spaces to the entire cache list. The 10000 here is hard-coded in the code and exists as the macro GC_ROOT_BUFFER_MAX_ENTRIES. If you need to modify this value, you need to modify the source code and recompile PHP. The gc_init function calls the gc_reset function after pre-allocating memory to reset some global variables used in the entire mechanism, such as setting the statistics of the number of gc runs (gc_runs) and the number of garbage in the gc (collected) to 0, and setting the head node of the doubly linked list. The previous node and next node point to itself, etc. In addition to the mentioned global variables used in the garbage collection mechanism, there are other commonly used variables, some of which are explained below:

  1. typedef struct _zend_gc_globals {
  2. zend_bool gc_enabled; /* Whether to turn on the garbage collection mechanism*/
  3. zend_bool gc_active; /* Whether it is in progress*/
  4. gc_root_buffer *buf; /* Pre-allocated buffer array, default is 10000 (preallocated arrays of buffers) */
  5. gc_root_buffer roots; /* List of possible roots of cycles */
  6. gc_root_buffer *unused; /* List of unused buffers */
  7. gc_root_buffer *first_unused; /* Pointer to first unused buffer node (pointer to first unused buffer) */
  8. gc_root_buffer *last_unused; /* Pointer to the last unused buffer node, this Pointer to last unused buffer */
  9. zval_gc_info *zval_to_free; /* Temporary list of zvals to free */
  10. zval_gc_info *free_list; /* Temporary variables, required Beginning of the released list*/
  11. zval_gc_info *next_to_free; /* Temporary variable, the next variable location to be released*/
  12. zend_uint gc_runs; /* Statistics of the number of gc runs*/
  13. zend_uint collected; /* Number of garbage in gc */
  14. // Omitted...
  15. }
Copy code

When we use an unset operation to clear the memory occupied by this variable (maybe just decrementing the reference count by one), it will be from the hash of the current symbol Delete the item corresponding to the variable name from the table. After all operations are performed, a destructor is called for the item deleted from the symbol table. Temporary variables will call zval_dtor, and general variables will call zval_ptr_dtor. Of course we cannot find the unset function in PHP's function set because it is a language construct. The corresponding intermediate code is ZEND_UNSET, and you can find its related implementation in the Zend/zend_vm_execute.h file. zval_ptr_dtor is not a function, just a macro that looks a bit like a function. In the Zend/zend_variables.h file, this macro points to the function _zval_ptr_dtor. In line 424 of Zend/zend_execute_API.c, the function-related code is as follows:

  1. ZEND_API void _zval_ptr_dtor(zval **zval_ptr ZEND_FILE_LINE_DC) /* {{{ */
  2. {
  3. #if DEBUG_ZEND>=2
  4. printf("Reducing refcount for %x (%x): %d-> ;%DN ", *ZVAL_PTR, ZVAL_PTR, Z_REFCOUNT_PP (ZVAL_PTR), Z_REFCOUNT_PP (ZVAL_PTR) - 1); Fcount_pp (zval_ptr) == 0) {
  5. TSRMLS_FETCH ();
  6. if (*zval_ptr != &EG(uninitialized_zval)) {
  7. GC_REMOVE_ZVAL_FROM_BUFFER(*zval_ptr);
  8. zval_dtor(*zval_ptr);
  9. efree_rel(*zval_ptr);
  10. }
  11. } else {
  12. TSRMLS_FETCH();
  13. if (Z_RE FCOUNT_PP(zval_ptr ) == 1) {
  14. Z_UNSET_ISREF_PP(zval_ptr);
  15. }
  16. GC_ZVAL_CHECK_POSSIBLE_ROOT(*zval_ptr);
  17. }
  18. }
  19. /* }}} */
  20. Copy code

From the code, we can clearly see the destruction process of this zval. The following two operations are performed on the reference counting field: If the variable's reference count is 1, that is, the reference count is 0 after decrementing one, the variable is cleared directly. If the current variable is cached, the cache needs to be cleared. If the reference count of the variable is greater than 1, that is, the reference count after subtracting one is greater than 0, the variable will be placed in the garbage list. If the change has a reference, remove its reference.

The operation of putting variables into the garbage list is GC_ZVAL_CHECK_POSSIBLE_ROOT, which is also a macro and corresponds to the function gc_zval_check_possible_root, but this function only performs garbage collection operations on arrays and objects. For array and object variables, it calls the gc_zval_possible_root function.

  1. ZEND_API void gc_zval_possible_root(zval *zv TSRMLS_DC)
  2. {
  3. if (UNEXPECTED(GC_G(free_list) != NULL &&
  4. GC_ZVAL_ADDRESS(zv) != NULL &&
  5. GC_ZVAL_GET_COL OR(zv) == GC_BLACK) &&
  6. (GC_ZVAL_ADDRESS(zv) < GC_G(buf) ||
  7. GC_ZVAL_ADDRESS(zv) >= GC_G(last_unused))) {
  8. /* The given zval is a garbage that is going to be deleted by
  9. * currently running GC * /
  10. return;
  11. }
  12. if (zv->type == IS_OBJECT) {
  13. GC_ZOBJ_CHECK_POSSIBLE_ROOT(zv);
  14. return;
  15. }
  16. GC_BENCH_INC(zval_possible_root);
  17. if (GC_ZVAL_GET_COLOR(zv) != GC_ PURPLE) {
  18. GC_ZVAL_SET_PURPLE (zv);
  19. if (!GC_ZVAL_ADDRESS(zv)) {
  20. gc_root_buffer *newRoot = GC_G(unused);
  21. if (newRoot) {
  22. GC_G(unused) = newRoot->prev;
  23. } else if (GC_G(first_unused ) != GC_G(last_unused)) {
  24. newRoot = GC_G(first_unused);
  25. GC_G(first_unused)++;
  26. } else {
  27. if (!GC_G(gc_enabled)) {
  28. GC_ZVAL_SET_BLACK(zv);
  29. return;
  30. }
  31. zv->refcount__gc++;
  32. gc_collect_cycles(TSRMLS_C);
  33. zv->refcount__gc--;
  34. newRoot = GC_G(unused);
  35. if (!newRoot) {
  36. return;
  37. }
  38. GC_ZVAL_SET_PURPLE(zv);
  39. GC _G (unused) = newRoot->prev;
  40. }
  41. newRoot->next = GC_G(roots).next;
  42. newRoot->prev = &GC_G(roots);
  43. GC_G(roots).next->prev = newRoot;
  44. GC_G(roots).next = newRoot;
  45. GC_ZVAL_SET_ADDRESS(zv, newRoot);
  46. newRoot->handle = 0;
  47. newRoot->u.pz = zv;
  48. GC_BENCH_INC(zval_buffered);
  49. GC_BENCH_INC(root_buf _length );
  50. GC_BENCH_PEAK(root_buf_peak, root_buf_length);
  51. }
  52. }
  53. }
Copy code

As mentioned earlier, the gc_zval_check_possible_root function only performs garbage collection operations on arrays and objects. However, in the gc_zval_possible_root function, the GC_ZOBJ_CHECK_POSSIBLE_ROOT macro will be called for variables of object type. For other variable types that can be used for garbage collection mechanisms, the calling process is as follows: Check whether the zval node information has been put into the node buffer. If it has been put into the node buffer, return directly, which can optimize its performance. Then process the object node and return directly without performing subsequent operations to determine whether the node has been marked purple. If it is purple, it will no longer be added to the node buffer. This is to ensure that a node is only added to the node once. Buffer operations.

Mark the color of the node as purple, indicating that the node has been added to the buffer and does not need to be added next time. Find the location of the new node and perform garbage collection if the buffer is full. Add a new node to the doubly linked list where the buffer is located. In the gc_zval_possible_root function, when the buffer is full, the program calls the gc_collect_cycles function to perform garbage collection operations. The most critical steps are: Line 628 is step B of the algorithm in its official document. The algorithm uses depth-first search to find all possible roots. After finding it, the reference count in each variable container is decremented by 1. To ensure that the same variable container is not decremented Two "1"s are marked with gray marks that have been subtracted by 1. Line 629 This is step C of the algorithm, where the algorithm again uses a depth-first search for each root node, checking the reference count of each variable container. If the reference count is 0, the variable container is marked white. If the reference count is greater than 0, resume the operation that used depth-first search to decrement the reference count at this point (i.e., increase the reference count by 1), and then re-mark them in black. Line 630 The last step D of the algorithm, the algorithm traverses the root buffer to remove the variable container roots (zval roots) from there, and at the same time, checks if there are any variable containers that were marked white in the previous step. Each white-marked variable container is cleared. In [gc_collect_cycles() -> gc_collect_roots() -> zval_collect_white() ] we can see that the nodes marked white will be added to the global variable zval_to_free list. This list will be used later. PHP's garbage collection mechanism marks the status with four colors during execution. GC_WHITE white means garbage GC_PURPLE Purple means it has been put into the buffer GC_GREY gray indicates that a refcount reduction operation of one has been performed GC_BLACK black is the default color, normal The relevant tags and operation codes are as follows:

  1. #define GC_COLOR 0x03
  2. #define GC_BLACK 0x00
  3. #define GC_WHITE 0x01
  4. #define GC_GREY 0x02
  5. #define GC_PURPLE 0x03
  6. #define GC_ADDRESS(v )
  7. ((gc_root_buffer*)(((zend_uintptr_t)(v )) & ~GC_COLOR))
  8. #define GC_SET_ADDRESS(v, a)
  9. (v) = ((gc_root_buffer*)((((zend_uintptr_t)(v)) & GC_COLOR) | ((zend_uintptr_t)(a))))
  10. #define GC_GET_COLOR(v)
  11. (((zend_uintptr_t)(v)) & GC_COLOR)
  12. #define GC_SET_COLOR(v, c)
  13. (v) = ((gc_root_buffer*)((((zend_uintptr_t)(v)) & ~GC_COLOR) | (c)))
  14. #define GC_SET_BLACK(v)
  15. (v) = ((gc_root_buffer*)((zend_uintptr_t)(v)) & ~GC_COLOR))
  16. #define GC_SET_PURPLE(v)
  17. (v ) = ((gc_root_buffer*)((zend_uintptr_t)(v)) | GC_PURPLE))
Copy code

The above way of marking status with bits is used more frequently in PHP source code, such as Memory management, etc. are all used. This is a relatively efficient and economical solution. However, when we design the database, we may not be able to use this method for fields. We should implement it in a more intuitive and readable way.



Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn