<p>文章来自:http://www.aintnot.com/2016/02/12/phps-source-code-for-php-developers-part3-variables-ch</p> <p>原文:http://blog.ircmaxell.com/2012/03/phps-source-code-for-php-developers_21.html</p>
In the third article of the "PHP Source Code for PHP Developers" series, we intend to expand on the previous article to help understand how PHP works internally. In the first article, we introduced how to view PHP source code, what its code structure is like, and some basic C pointers introduced to PHP developers. The second article introduces functions. This time, we’re going to dive into one of PHP’s most useful structures: variables.
In PHP’s core code, variables are called ZVAL
. There's a reason why this structure is important, and it's not just that PHP uses weak typing and C uses strong typing. So how does ZVAL solve this problem? To answer this question, we need to carefully look at the definition of the ZVAL type. To see this definition, let's try searching for zval in the definition search box on the lxr page. At first glance, we don't seem to find anything useful. But there is a line typedef
in the zend.h file (typedef is a way of defining new data types in C). This may be what we are looking for, let's continue looking. Originally, this seemed irrelevant. There isn't anything useful here. But to confirm something, let’s click on the _zval_struct
line.
<span>1</span> <span>struct</span><span> _zval_struct { </span><span>2</span> <span>/*</span><span> Variable information </span><span>*/</span> <span>3</span> zvalue_value value; <span>/*</span><span> value </span><span>*/</span> <span>4</span> <span>zend_uint refcount__gc; </span><span>5</span> zend_uchar type; <span>/*</span><span> active type </span><span>*/</span> <span>6</span> <span>zend_uchar is_ref__gc; </span><span>7</span> };
Then we get the foundation of PHP, zval. Seems simple, right? Yes, that's true, but there's also something magical here that makes a lot of sense. Note that this is a structure or structure. Basically, this can be thought of as classes in PHP, which only have public properties. Here, we have four attributes: value
, refcount__gc
, type
and is_ref__gc
. Let's look at these properties one by one (omitting their order).
The first element we talk about is the value variable, its type is zvalue_value
. I don’t know about you, but I’ve never heard of it either zvalue_value
. So let's try to understand what it is. As with the rest of the site, you can click on a type to see its definition. Once you click on it, you'll see that its definition is the same as this:
<span>typedef union _zvalue_value { </span><span>long</span> lval; <span>/*</span><span> long value </span><span>*/</span> <span>double</span> dval; <span>/*</span><span> double value </span><span>*/</span> <span>struct</span><span> { </span><span>char</span> *<span>val; </span><span>int</span><span> len; } str; HashTable </span>*ht; <span>/*</span><span> hash table value </span><span>*/</span><span> zend_object_value obj; } zvalue_value;</span>
Now, here’s some cool tech. See that definition of union? That means this is not really a struct, but a separate type. But there are multiple types of variables in it. If there are multiple types here, how can it be considered a single type? I'm glad you asked this question. To understand this problem, we need to first recall the types in C language we talked about in the first article.
In C, a variable is just a label for a line of memory address. It can also be said that the type is just a way of identifying which piece of memory will be used. Nothing is used in C to separate the 4-byte string from the integer value. They are just one block of memory. The compiler will try to resolve a memory segment by "identifying" it as a variable, and then converting those variables to a specific type, but this is not always successful (BTW, when a variable "overwrites" the memory segment it gets , that will generate a segfault).
Well, as we know, unions are separate types that are interpreted differently depending on how they are accessed. This allows us to define a value to support multiple types. One thing to note is that all types of data must use the same memory to store. In this example, in a 64-bit compiler, both long and double will occupy 64 bits for storage. The string structure will occupy 96 bits (64 bits to store the character pointer and 32 bits to store the integer length). hash_table
will occupy 64 bits, and zend_object_value
will occupy 96 bits (32 bits are used to store elements, and the remaining 64 bits are used to store pointers). The entire union will occupy the memory size of the largest element, so it is 96 bits here.
Now, if we look at this union again, we can see that there are only 5 PHP data types here (long == int, double == float, str == string, hashtable == array, zend_object_value == object). So where do the remaining data types go? It turns out that this structure is enough to store the remaining data types. BOOL uses long (int) to store, NULL
does not occupy the data segment, and RESOURCE
also uses long to store.
Because this value union does not control how it is accessed, we need other ways to record the type of the variable. Here, we can use the data type to get information about how to access the value. It uses the type byte to handle this problem (zend_uchar
is an unsigned character, or a byte in memory). It retains this information from zend type constants. This is really a kind of magic, which requires using zval.type = IS_LONG
to define integer data. Therefore this field and the value field are enough for us to know the type and value of the PHP variable.
这个字段标识变量是否为引用。那就是说,如果你执行了在变量里执行了$foo = &$bar
。如果它是0,那么变量就不是一个引用,如果它是1,那么变量就是一个引用。它并没有做太多的事情。那么,在我们结束_zval_struct
之前,再看一看它的第四个成员。
这个变量是指向PHP变量容器的指针的计数器。也就是说,如果refcount是1,那就表示有一个PHP变量使用这个容器。如果refcount是2,那就表示有两个PHP变量指向同一个变量容器。单独的refcount变量并没有太多有用的信息,但如果它与is_ref
一起使用,就构成了垃圾回收器和写时复制的基础。它允许我们使用同一个zval容器来保存一个或多个PHP变量。refcount的语义解释超出这篇文章的范围,如果你想继续深入,我推荐你查看这篇文档。
这就是ZVAL的所有内容。
在PHP内部,zval使用跟其他C变量一样,作为内存段或者一个指向内存段的指针(或者指向指针的指针,等等),传递到函数。一旦我们有了变量,我们就想访问它里面的数据。那我们要怎么做到呢?我们使用定义在zend_operators.h
文件里面的宏来跟zval一起使用,使得访问数据更简单。有一点很重要的是,每一个宏都有多个拷贝。不同的是它们的前缀。例如,要得出zval的类型,有Z_TYPE(zval)
宏,这个宏返回一个整型数据来表示zval参数。但这里还有一个Z_TYPE(zval_p)
宏,它跟Z_TYPE(zval)
做的事情是一样的,但它返回的是指向zval的指针。事实上,除了参数的属性不一样之外,这两个函数是一样的,实际上,我们可以使用Z_TYPE(*zval_p)
,但_P和_PP让事情更简单。
我们可以使用VAL这一类宏来获取zval的值。可以调用Z_LVAL(zval)
来得到整型值(比如整型数据和资源数据)。调用Z_DVAL(zval)
来得到浮点值。还有很多其他的,到这里到此为止。要注意的关键是,为了在C里面获取zval的值,你需要使用宏(或应该)。因此,当我们看见有函数使用它们时,我们就知道它是从zval里面提取它的值。
到现在为止,我们知识谈论了类型和zval的值。我们都知道,PHP帮我们做了类型判断。因此,如果我们喜欢,我们可以将一个字符串当作一个整型值。我们把这一步叫做convert_to_type
。要转换一个zval为string值,就调用convert_to_string
函数。它会改变我们传递给函数的ZVAL的类型。因此,如果你看到有函数在调用这些函数,你就知道它是在转换参数的数据类型。
上一篇文章中,介绍了zend_parse_paramenters
这个函数。既然我们知道PHP变量在C里面是怎么表示的,那我们就来深入看看。
ZEND_API <span>int</span> zend_parse_parameters(<span>int</span> num_args TSRMLS_DC, <span>const</span> <span>char</span> *<span>type_spec, ...) { va_list va; </span><span>int</span><span> retval; RETURN_IF_ZERO_ARGS(num_args, type_spec, </span><span>0</span><span>); va_start(va, type_spec); retval </span>= zend_parse_va_args(num_args, type_spec, &va, <span>0</span><span> TSRMLS_CC); va_end(va); </span><span>return</span><span> retval; }</span>
现在,从表面上看,这看起来很迷惑。重点要理解的是,va_list类型只是一个使用'...'的可变参数列表。因此,它跟PHP中的func_get_args()
函数的构造差不多。有了这个东西,我们可以看到zend_parse_parameters
函数马上调用zend_parse_va_args
函数。我们继续往下看看这个函数...
这个函数看起来很有趣。第一眼看去,它好像做了很多事情。但仔细看看。首先,我们可以看到一个for循环。这个for循环主要遍历从zend_parse_parameters
传递过来的type_spec
字符串。在循环里面我们可以看到它只是计算期望接收到的参数数量。它是如何做到这些的研究就留给读者。
继续往下看,我么可以看到有一些合理的检查(检查参数是否都正确地传递),还有错误检查,检查是否传递了足够数量的参数。接下来进入一个我们感兴趣的循环。这个循环真正解析那些参数。在循环里面,我们可以看到有三个if语句。第一个处理可选参数的标识符。第二个处理var-args
(参数的数量)。第三个if语句正是我们感兴趣的。可以看到,这里调用了zend_parse_arg()
函数。让我们再深入看看这个函数...
继续往下看,我们可以看到这里有一些非常有趣的事情。这个函数再调用另一个函数(zend_parse_arg_impl),然后得到一些错误信息。这在PHP里面是一种很常见的模式,将函数的错误处理工作提取到父函数。这样代码实现和错误处理就分开了,而且可以最大化地重用。你可以继续深入研究那个函数,非常容易理解。但我们现在仔细看看zend_parse_arg_impl()
...
现在,我们真正到了PHP内部函数解析参数的步骤。让我们看看第一个switch语句的分支,这个分支用来解析整型参数。接下来的应该很容易理解。那么,我们从分支的第一行开始吧:
<span>long</span> *p = va_arg(*va, <span>long</span> *);
如果你记得我们之前说的,va_args是C语言处理变量参数的方式。所以这里是定义一个整型指针(long在C里面是整型)。总之,它从va_arg函数里面得到指针。这说明,它得到传递给zend_parse_parameters函数的参数的指针。所以这就是我们会用分支结束后的值赋值的指针结果。接下来,我们可以看到进入一个根据传递进来的变量(zval)类型的分支。我们先看看IS_STRING
分支(这一步会在传递整型值到字符串变量时执行)。
<span>case</span><span> IS_STRING: { </span><span>double</span><span> d; </span><span>int</span><span> type; </span><span>if</span> ((type = is_numeric_string(Z_STRVAL_PP(arg), Z_STRLEN_PP(arg), p, &d, -<span>1</span>)) == <span>0</span><span>) { </span><span>return</span> <span>"</span><span>long</span><span>"</span><span>; } </span><span>else</span> <span>if</span> (type ==<span> IS_DOUBLE) { </span><span>if</span> (c == <span>'</span><span>L</span><span>'</span><span>) { </span><span>if</span> (d ><span> LONG_MAX) { </span>*p =<span> LONG_MAX; </span><span>break</span><span>; } </span><span>else</span> <span>if</span> (d <<span> LONG_MIN) { </span>*p =<span> LONG_MIN; </span><span>break</span><span>; } } </span>*p =<span> zend_dval_to_lval(d); } } </span><span>break</span>;
现在,这个做的事情并没有看起来的那么多。所有的事情都归结与is_numeric_string
函数。总的来说,该函数检查字符串是否只包含整数字符,如果不是的话就返回0。如果是的话,它将该字符串解析到变量里(整型或浮点型,p或d),然后返回数据类型。所以我们可以看到,如果字符串不是纯数字,他返回“long”字符串。这个字符串用来包装错误处理函数。否则,如果字符串表示double(浮点型),它先检查这个浮点数作为整型数来存储的话是否太大,然后它使用zend_dval_to_lval
函数来帮助解析浮点数到整型数。这就是我们所知道的。我们已经解析了我们的字符串参数。现在继续看看其他分支:
<span>case</span><span> IS_DOUBLE: </span><span>if</span> (c == <span>'</span><span>L</span><span>'</span><span>) { </span><span>if</span> (Z_DVAL_PP(arg) ><span> LONG_MAX) { </span>*p =<span> LONG_MAX; </span><span>break</span><span>; } </span><span>else</span> <span>if</span> (Z_DVAL_PP(arg) <<span> LONG_MIN) { </span>*p =<span> LONG_MIN; </span><span>break</span><span>; } } </span><span>case</span><span> IS_NULL: </span><span>case</span><span> IS_LONG: </span><span>case</span><span> IS_BOOL: convert_to_long_ex(arg); </span>*p =<span> Z_LVAL_PP(arg); </span><span>break</span>;
这里,我们可以看到解析浮点数的操作,这一步跟解析字符串里的浮点数相似(巧合?)。有一个很重要的事情要注意的是,如果参数的标识不是大写'L',它会跟其他类型变量一样的处理方式(这个case语句没有break)。现在,我们还有一个有趣的函数,convert_to_long_ex()。这跟我们之前说到的convert_to_type()函数集合是一类的,该函数转换参数为特定的类型。唯一的不同是,如果参数不是引用的话(因为这个函数在改变数据类型),这个函数就将变量的值及其引用分离(拷贝)了。( The only difference is that it separates (copies) the passed in variable if it's not a reference (since it's changing the type). )这就是写时复制的作用。因此,当我们传递一个浮点数到到一个非引用的整型变量,该函数会把它当作整型来处理,但我们仍然可以得到浮点型数据。
<span>case</span><span> IS_ARRAY: </span><span>case</span><span> IS_OBJECT: </span><span>case</span><span> IS_RESOURCE: </span><span>default</span><span>: </span><span>return</span> <span>"</span><span>long</span><span>"</span>;
最后,我们还有另外三个case分支。我们可以看到,如果你传递一个数组、对象、资源或者其他不知道的类型到整型变量中,你会得到错误。
剩下的部分我们留给读者。阅读zend_parse_arg_impl
函数对更好地理解额PHP类型判断系统真的很有用。一部分一部分地读,然后尽量追踪在C里面的各种参数的状态和类型。
下一部分会在Nikic的博客(我们会在这个系列的文章来回跳转)。在下一篇,他会谈到数组的所有内容。