Home  >  Article  >  Backend Development  >  Learn the innovation and performance optimization of PHP7

Learn the innovation and performance optimization of PHP7

coldplay.xixi
coldplay.xixiforward
2020-06-24 17:26:103064browse

Learn the innovation and performance optimization of PHP7

PHP has gone through 20 years of history. PHP7 can be said to be a large-scale innovation compared to the previous series of PHP5, especially in terms of performance, which has achieved a significant improvement by leaps and bounds. . PHP is a web development language that is widely used around the world. The innovation of PHP7 will certainly bring more profound changes to these web services.

Here is a quote from a chart in Bird Brother’s PPT (82% of Web sites use PHP as a development language):

(Note: A web site can use multiple languages ​​as its development language)
(Note: This article contains many screenshots from Brother Niao’s PPT, and the copyright of the pictures belongs to Brother Niao)

We Let’s take a look at the two exciting performance test results first.


The performance test results of PHP7. As a result of the performance stress test, the time consumption dropped from 2.991 to 1.186, a significant drop of 60%.

WordPress QPS stress test (picture from PPT):

In the WordPress project, PHP7 compared to PHP5.6, QPS increased by 2.77 times.

After reading the exciting comparison of performance test results, let’s get to the point. There are many new features in PHP7, but we will focus more on the major changes.

1. New features and changes

1. Scalar Type Declarations & Scalar Type Declarations

PHP language A very important feature is "weak typing", which makes PHP programs very easy to write, and novices can get started quickly when they come into contact with PHP. However, it is also accompanied by some controversy. Supporting the definition of variable types can be said to be an innovative change. PHP begins to support type definitions in an optional way. In addition, a switch instruction declare(strict_type=1); is also introduced. Once this instruction is turned on, it will force the program under the current file to follow strict function parameter transfer types and return types.

For example, an add function plus a type definition can be written like this:

If combined with the mandatory type switch instruction, it can become like this:

If strict_type is not turned on, PHP will try to help you convert it to the required type. After turning it on, PHP will no longer perform type conversion, and a type mismatch will be thrown. mistake. This is great news for students who like "strongly typed" languages.

More detailed introduction: PHP7 Scalar Type Declaration RFC [Translation]

2. More Errors become catchable Exceptions

PHP7 implements a global throwable interface. The original Exception and some Errors implement this interface (interface), and define the inheritance structure of exceptions in the form of interfaces. As a result, more Errors in PHP7 become catchable Exceptions and are returned to developers. If they are not caught, they are Errors. If they are caught, they become Exceptions that can be handled within the program. These catchable Errors are usually Errors that will not cause fatal harm to the program, such as a function that does not exist. PHP7 further facilitates developers' processing and gives developers greater control over the program. Because by default, Error will directly cause the program to interrupt, and PHP7 provides the ability to capture and process it, allowing the program to continue executing, providing programmers with more flexible choices.

For example, to execute a function that we are not sure whether it exists or not, the PHP5-compatible method is to append the judgment function_exist before the function is called, while PHP7 supports the handling method of catching Exception.

As shown in the figure below (the screenshot is from the PPT):

3. AST (Abstract Syntax Tree, Abstract Syntax Tree)

AST plays the role of a middleware in the PHP compilation process, replacing the original method of spitting out opcode directly from the interpreter, decoupling the interpreter (parser) and the compiler (compliler), which can reduce some Hack codes and at the same time , making the implementation easier to understand and maintain.
PHP5:

PHP7:

More AST information: https://wiki.php.net/rfc/abstract_syntax_tree

4. Native TLS (Native Thread local storage, native thread local storage)

PHP needs to solve the problem of "thread safety" (TS, Thread Safe) in multi-threaded mode (for example, the woker and event modes of the web server Apache, which are multi-threaded), because threads share the memory space of the process. , so each thread itself needs to build a private space in some way to save its own private data to avoid mutual contamination with other threads. The method adopted by PHP5 is to maintain a large global array and allocate an independent storage space to each thread. Threads access this global data group through their own key values.

This unique key value needs to be passed to every function that needs to use global variables in PHP5. PHP7 believes that this method of passing is not friendly and has some problems. Therefore, try to use a global thread-specific variable to save this key value.
Related Native TLS issues:
https://wiki.php.net/rfc/native-tls

5. Other new features

PHP7 new features and changes are not Sorry, we won’t go into detail here.
(1) Int64 support, unifying the integer length under different platforms, and supporting strings and file uploads greater than 2GB.
(2) Uniform variable syntax.
(3) Consistently foreach behaviors
(4) New operators <=>, ??
(5) Unicode character format support (\u{xxxxx})
(6) Anonymous Class support (Anonymous Class)
… …

2. Leap-forward performance breakthrough: full speed forward

1. JIT and performance

Just In Time (just-in-time compilation) is a software optimization technology that compiles bytecode into machine code only at runtime. From an intuitive point of view, it is easy for us to think that machine code can be directly recognized and executed by computers, and it is more efficient than Zend to read opcode and execute it one by one. Among them, HHVM (HipHop Virtual Machine, HHVM is a Facebook open source PHP virtual machine) uses JIT, which improves their PHP performance test by an order of magnitude and releases shocking test results, which also makes us intuitively think that JIT is A powerful technology that turns stone into gold.
In fact, in 2013, Brother Niao and Dmitry (one of the developers of the PHP language core) once made a JIT attempt on the PHP5.5 version (it was not released). The original execution process of PHP5.5 is to compile the PHP code into opcode bytecode through lexical and syntactic analysis (the format is somewhat similar to assembly). Then, the Zend engine reads these opcode instructions and parses and executes them one by one.

And they introduced type inference (TypeInf) after the opcode link, then generated ByteCodes through JIT, and then executed.

As a result, exciting results were obtained in the benchmark (test program). After implementing JIT, the performance was improved by 8 times compared with PHP5.5. However, when they put this optimization into the actual project WordPress (an open source blogging project), they saw almost no performance improvement and got a puzzling test result.
So, they used the profile type tool under Linux to analyze the CPU time consumption of program execution.
Distribution of CPU consumption when executing WordPress 100 times (screenshot from PPT):

Note:
21% of CPU time is spent on memory management.
12% of CPU time is spent on hash table operations, mainly adding, deleting, modifying and checking PHP arrays.
30% of CPU time is spent in built-in functions, such as strlen.
25% of CPU time is spent in VM (Zend Engine).

After analysis, two conclusions were drawn:

(1) If the ByteCodes generated by JIT are too large, it will cause a decrease in the CPU cache hit rate (CPU Cache Miss)

In PHP5.5 code, because there is no obvious type definition, we can only rely on type inference. Define the variable types that can be inferred as much as possible, and then, combined with type inference, remove branch codes that are not of that type and generate directly executable machine code. However, type inference cannot infer all types. In WordPress, less than 30% of the type information that can be inferred is limited, and the branch code that can be reduced is limited. As a result, after JIT, machine code is directly generated, and the generated ByteCodes are too large, eventually causing a significant decrease in CPU cache hits (CPU Cache Miss).

CPU cache hit means that when the CPU reads and executes instructions, if the required data cannot be read in the CPU's first-level cache (L1), it has to continue searching downwards. To the second level cache (L2) and the third level cache (L3), it will eventually try to find the required instruction data in the memory area, and the read time difference between the memory and the CPU cache can reach 100 times. Therefore, if the ByteCodes are too large and the number of executed instructions is too large, the multi-level cache cannot accommodate so much data, and some instructions will have to be stored in the memory area.

The sizes of CPU caches at all levels are also limited. The following picture is the configuration information of Intel i7 920:

Therefore, the decrease in CPU cache hit rate will bring serious consequences The time-consuming increase, on the other hand, the performance improvement brought by JIT is also offset by it.

Through JIT, the overhead of VM can be reduced. At the same time, through instruction optimization, the development of memory management can be indirectly reduced because the number of memory allocations can be reduced. However, for real WordPress projects, only 25% of the CPU time is spent on the VM, and the main problem and bottleneck is not actually on the VM. Therefore, the JIT optimization plan was not included in the PHP7 features of this version. However, it is likely to be implemented in a later version, which is worth looking forward to.

(2) The improvement effect of JIT performance depends on the actual bottleneck of the project

JIT has greatly improved in the benchmark because the amount of code is relatively small, the final generated ByteCodes are also relatively small, and the main overhead is in the VM. However, there is no obvious performance improvement in the actual WordPress project because the code volume of WordPress is much larger than that of the benchmark. Although JIT reduces the overhead of the VM, it causes a decrease in CPU cache hits and extra memory because the ByteCodes are too large. Overhead, ultimately, there is no improvement.
Different types of projects will have different CPU overhead ratios and will get different results. Performance testing without actual projects is not very representative.

2. Changes in Zval

In fact, the actual storage carrier of various types of variables in PHP is Zval, which is characterized by its tolerance and tolerance. Essentially, it is a structure (struct) implemented in C language. For students who write PHP, you can roughly understand it as something similar to an array.
PHP5's Zval, the memory occupies 24 bytes (screenshot from PPT):

PHP7's Zval, the memory occupies 16 bytes (screenshot from PPT):

Zval dropped from 24 bytes to 16 bytes. Why did it drop? Here we need to add a little bit of C language foundation to help students who are not familiar with C understand. There is a slight difference between struct and union (union). Each member variable of Struct occupies an independent memory space, while the member variables in union share a memory space (that is to say, if one of the member variables is modified, the public space will be After modification, there will be no records of other member variables). Therefore, although there appear to be a lot more member variables, the actual memory space occupied has decreased.

In addition, there are other features that have been significantly changed. Some simple types no longer use references.

Zval structure diagram (from PPT):

Zval in the diagram consists of two 64bits (1 byte = 8bit, bit is "bit"). If the variable If the type is long or bealoon, and the length does not exceed 64 bits, it will be stored directly in value, and there will be no following reference. When the variable type is array, objec, string, etc. that exceeds 64 bits, the value stored is a pointer pointing to the real storage structure address.

For simple variable types, Zval storage becomes very simple and efficient.

Types that do not require references: NULL, Boolean, Long, Double
Types that require references: String, Array, Object, Resource, Reference

3. Internal type zend_string

Zend_string is the structure that actually stores strings. The actual content will be stored in val (char, character type), and val is a char array with a length of 1 (convenient for member variable occupancy).

The last member variable of the structure uses a char array instead of char*. Here is a small optimization trick that can reduce the cache miss of the CPU.
If you use a char array, when malloc applies for the memory of the above structure, it applies for the same area, usually the length is sizeof(_zend_string) actual char storage space. However, if you use char*, what is stored in this location is only a pointer, and the actual storage is in another independent memory area.

Comparison of memory allocation using char[1] and char*:

From the perspective of logical implementation, there is actually not much difference between the two, and the effects are very similar. In fact, when these memory blocks are loaded into the CPU, they look very different. Because the former is the same piece of memory allocated continuously together, it can usually be obtained together when the CPU reads it (because it will be in the same level cache). The latter, because it contains data from two memories, when the CPU reads the first memory, it is very likely that the second memory data is not in the same level cache, so the CPU has to search below L2 (secondary cache), or even to The desired second piece of memory data is found in the memory area. This will cause CPU Cache Miss, and the time-consuming difference between the two can be up to 100 times.

In addition, when copying strings, using reference assignment, zend_string can avoid memory copies.

6. Changes in PHP arrays (HashTable and Zend Array)

In the process of writing PHP programs, the most frequently used type is arrays, and PHP5 arrays are implemented using HashTable. To put it in a rough summary, it is a HashTable that supports doubly linked lists. It not only supports hash mapping to access elements through array keys, but can also traverse array elements by accessing doubly linked lists through foreach.
PHP5’s HashTable (screenshot from PPT):

This picture looks very complicated, with various pointers jumping around. When we access the content of an element through the key value, sometimes It takes three pointer jumps to find the right content. The most important point is that the storage of these array elements is scattered in different memory areas. In the same way, when the CPU reads, because they are likely not in the same level cache, the CPU will have to search in the lower-level cache or even the memory area, which will cause the CPU cache hit to decrease, thereby increasing more consumption. hour.

Zend Array of PHP7 (screenshot from PPT):

The new version of the array structure is very simple and eye-catching. The biggest feature is that the entire array elements and hash mapping table are all connected together and allocated in the same memory. If you are traversing a simple type array of integers, the efficiency will be very fast, because the array elements (Bucket) themselves are continuously allocated in the same memory, and the zval of the array elements will store the integer elements internally. There is also a pointer external link, and all data is stored in the current memory area. Of course, the most important thing is that it can avoid CPU Cache Miss (CPU cache hit rate decrease).

Changes in Zend Array:
(1) The value of the array defaults to zval.
(2) The size of HashTable is reduced from 72 to 56 bytes, a reduction of 22%.
(3) The size of Buckets is reduced from 72 to 32 bytes, a reduction of 50%.
(4) The memory space of the buckets of array elements is allocated together.
(5) The key of the array element (Bucket.key) points to zend_string.
(6) The value of the array element is embedded in the Bucket.
(7) Reduce CPU Cache Miss.

7. Function Calling Convention

PHP7 improves the function calling mechanism. By optimizing the parameter transfer process, it reduces some instructions and improves execution efficiency.

PHP5’s function calling mechanism (screenshot from PPT):

In the figure, the instructions send_val and recv parameters in the vm stack are the same. PHP7 reduces these two Items are repeated to achieve the underlying optimization of the function calling mechanism.

PHP7’s function calling mechanism (screenshot from PPT):

8. Let the compiler complete part of the work in advance through macro definitions and inline functions (inline)

The macro definition of C language will be executed in the preprocessing stage (compilation stage), part of the work is completed in advance, and there is no need to allocate memory when the program is running. It can achieve function-like functions without the pressure of function calls. The overhead of stacking and popping the stack will be relatively high. The same is true for inline functions. In the preprocessing stage, the functions in the program are replaced with function bodies. When the actual running program is executed here, there will be no overhead of function calls.

PHP7 has made a lot of optimizations in this area and put a lot of work that needs to be performed in the running phase into the compilation phase. For example, parameter type judgment (Parameters Parsing), because all involved here are fixed character constants, can be completed in the compilation stage, thereby improving subsequent execution efficiency.

For example, the way to handle the type of parameters passed is optimized from the writing method on the left to the macro writing method on the right.

3. Summary

Niao Ge’s PPT released a set of comparative data, which is that 100 times of WordPress execution in PHP5.6 will generate 7 billion CPU instruction executions, while in PHP7 Only 2.5 billion were needed, a 64.2% reduction. This is a shocking data.

Throughout Brother Niao’s sharing, the most profound point of view for me is: pay attention to details, many small optimizations, accumulate bit by bit continuously, add up to a small amount, and finally converge into a stunning result. It is impossible to build a mountain with nine people in one day. I think this is probably the reason.

There is no doubt that PHP7 has achieved leapfrog improvements in performance. If these results can be applied to PHP's Web system, perhaps we only need fewer machines to support higher request volume. services. The release of the official version of PHP7 is full of endless expectations.

Recommended tutorial: "php video tutorial"

The above is the detailed content of Learn the innovation and performance optimization of PHP7. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:csdn.net. If there is any infringement, please contact admin@php.cn delete