Detailed explanation of php's output buffer-PHP Tutorial-php.cn

Everyone knows that there is something called the "output buffer" layer in PHP. This article is here to explain what exactly it is? How is it implemented inside PHP? And how to use it in PHP program? This layer is not complicated, but it is often misunderstood and many PHP developers do not fully master it. Let’s figure it out together today.

What we are going to discuss is based on PHP 5.4 (and above). The OB layer in PHP has undergone many changes since version 5.4. To be precise, it has been completely rewritten. Some places may None of them are compatible with PHP 5.3.

What is an output buffer?

PHP's output stream contains a lot of bytes, which are usually text that programmers want PHP to output. Most of these texts are output by echo statements or printf() functions. There are three things you need to know about output buffers in PHP.

The first point is that any function that outputs something will use the output buffer. Of course, this is a program written in PHP. If you are writing a PHP extension, the function you use (C function) may write the output directly to the SAPI buffer layer without going through the OB layer. You can learn about the API documentation for these C functions in the source file main/php_output.h. This file provides us with a lot of other information, such as the default buffer size.

The second thing you need to know is that the output buffer layer is not the only layer used to buffer output, it is actually just one of many layers. The last thing you need to remember is that the behavior of the output buffer layer is related to the SAPI you are using (web or cli), different SAPIs may have different behaviors. Let’s first look at the relationship between these layers through a picture:

The above picture shows the logical relationship between the three buffer layers in PHP. The above two layers are what we usually recognize as "output buffers", and the last one is the output buffer in SAPI. These are all layers in PHP, and buffers appear again as the output bytes leave PHP and enter the lower layers of the computer architecture (terminal buffers, fast-cgi buffers, web server buffers , OS buffer, TCP/IP stack buffer.). Remember as a general principle, except in the case of PHP discussed in this article, many parts of a software will retain information before passing it on to the next part until it is finally passed to the user.

CLI’s SAPI is a bit special, so I’ll focus on it here. The CLI will force theoutput_bufferoption in the INI configuration to 0, which disables the default PHP output buffer. So in the CLI, by default what you want to output will be passed directly to the SAPI layer, unless you manually call the ob_() class function. And in the CLI, the value ofimplicit_flushwill also be set to 1. We often get confused about the role ofimplicit_flush, the source code says it all: whenimplicit_flushis set to open (value is 1), once any output is written to the SAPI buffer layer, it will flush immediately (flush means writing the data to the lower layer and the buffer will be emptied). In other words: any time you write any data to the CLI When in SAPI, CLI SAPI will immediately throw the data to its next layer, usually the standard output pipe. The two functions write() and fflush() are responsible for doing this. Simple, right!

Default PHP output buffer

If you use a SAPI different from the CLI, like PHP-FPM, you will use the following three buffer-related INI configuration options:

output_buffering
implicit_flush
##output_handler

Before clarifying the meaning of these options, one thing needs to be explained first. You cannot use ini_set() to change the values of these options at runtime. The values of these options will be parsed when the PHP program starts, before any scripts are run, so you may be able to use ini_set() to change their values at runtime, but the changed values will not take effect and everything will be too late. , because the output buffer layer is already up and active. You can only change their values by editing the php.ini file or using the -d option when executing a PHP program.

By default, the PHP distribution will setoutput_bufferingto 4096 bytes in php.ini. If you don't use any php.ini file (or don't use the -d option when starting PHP), its default value will be 0, which disables the output buffer. If you set its value to "ON", the default output buffer size will be 16kb. As you may have guessed, using buffers for output in a web application environment has performance benefits. The default setting of 4k is a suitable value, which means you can write 4096 ASCII characters first and then communicate with the SAPI layer below. And in a web application environment, the method of transmitting messages byte by byte through socket is not good for performance. A better approach would be to transfer everything to the server at once, or at least piece by piece. The fewer data exchanges between layers, the better the performance. You should always keep output buffers available and PHP will be responsible for transferring their contents to the end user after the request is completed without you having to do anything.

implicit_flushAlready mentioned before when talking about CLI. For other SAPIs,implicit_flushis set to off by default, which is the correct setting, because refreshing the SAPI whenever new data is written is probably not what you want. For the FastCGI protocol, the flushing operation is to send a FastCGI array packet (packet) after each write. It would be better if the FastCGI buffer is filled before sending the data packet. If you want to manually flush the SAPI buffer, use PHP's flush() function. If you want to write once and refresh once, you can set theimplicit_flushoption in the INI configuration, or call the ob_implicit_flush() function once.

output_handleris a callback function that can modify the contents of the buffer before the buffer is flushed. PHP extensions provide many callback functions (users can also write their own callback functions, which will be discussed below).

ob_gzhandler: Use ext/zlib to compress output
mb_output_handler: Use ext/mbstring to convert character encoding
ob_iconv_handler: Use ext/iconv to convert character encoding
##ob_tidyhandler: Use ext/tidy to organize the output HTML text
ob_[inflate/deflate]_handler: Use ext/http to compress the output
ob_etaghandler: Use ext/http to automatically generate HTTP Etag

The content in the buffer will be passed to the callback function of your choice (only one can be used) to perform the content conversion work, so if you want to get PHP to transmit to the web server As well as user content, you can use the output buffer callback. One thing that needs to be mentioned now is that the "output" mentioned here refers to the message headers (headers) and the message body (body). The HTTP message header is also part of the OB layer.

Message Headers and Message Body

When you use an output buffer (either user's or PHP's), you may want to send the HTTP message headers and messages in the way you want. content. You know that any protocol must send message headers before sending the message body (that's why they are called "headers"), but if you use the output buffer layer, then PHP will take over these without you having to worry about it. In fact, any PHP function related to the output of message headers (header(), setcookie(), session_start()) uses the internal sapi_header_op() function, which only writes the content to the message header buffer. . Then when you output content, for example using printf(), the content will be written to the output buffer (assuming there is only one). When the content in this output buffer needs to be sent, PHP will first send the message header and then the message body. PHP does everything for you. If you feel uncomfortable and want to do it yourself, then you have no other choice but to disable the output buffer.

User output buffers (user output buffers)

For the user output buffer, let's first look at an example to see how it works and what you can do with it. Again, if you want to use the default PHP output buffer layer, you cannot use the CLI because it has this layer disabled. The following example uses the default PHP output buffer, using PHP's internal web server SAPI:

/* launched via php -doutput_buffering=32 -dimplicit_flush=1 -S127.0.0.1:8080 -t/var/www */echo str_repeat('a', 31); sleep(3); echo 'b'; sleep(3); echo 'c';

Copy after login

In this example, the default output buffer size is set to 32 bytes when starting PHP , after the program runs, it will first write 31 bytes into it, and then enter the sleep state. At this point the screen is empty and nothing will be output, as expected. After 2 seconds, the sleep ends and another byte is written. This byte fills the buffer. It will immediately refresh itself and pass the data inside to the buffer of the SAPI layer. Because we set implicit_flush to 1, so The SAPI layer's buffer is also immediately flushed to the next layer. The string 'aaaaaaaaaa{31 a}b' will appear on the screen, and then the script will sleep again. After 2 seconds, another byte is output. At this time, there are 31 null bytes in the buffer, but the PHP script has been executed, so the buffer containing this 1 byte will be refreshed immediately, which will be output on the screen. String 'c'.

从这个示例我们可以看到默认PHP输出缓冲区是如何工作的。我们没有调用任何跟缓冲区相关的函数，但这并不意味这它不存在，你要认识到它就存在当前程序的运行环境中（在非CLI模式中才有效）。

OK，现在开始讨论用户输出缓冲区，它通过调用ob_start()创建，我们可以创建很多这种缓冲区（至到内存耗尽为止），这些缓冲区组成一个堆栈结构，每个新建缓冲区都会堆叠到之前的缓冲区上，每当它被填满或者溢出，都会执行刷新操作，然后把其中的数据传递给下一个缓冲区。

ob_start(function($ctc) { static $a = 0; return $a++ . '- ' . $ctc . "\n";}, 10); ob_start(function($ctc) { return ucfirst($ctc); }, 3);echo "fo"; sleep(2);echo 'o'; sleep(2);echo "barbazz"; sleep(2);echo "hello";/* 0- FooBarbazz\n 1- Hello\n */

Copy after login

在此我代替原作者讲解下这个示例。我们假设第一个ob_start创建的用户缓冲区为缓冲区1，第二个ob_start创建的为缓冲区2。按照栈的后进先出原则，任何输出都会先存放到缓冲区2中。

缓冲区2的大小为3个字节，所以第一个echo语句输出的字符串'fo'（2个字节）会先存放在缓冲区2中，还差一个字符，当第二echo语句输出的'o'后，缓冲区2满了，所以它会刷新(flush)，在刷新之前会先调用ob_start()的回调函数，这个函数会将缓冲区内的字符串的首字母转换为大写，所以输出为'Foo'。然后它会被保存在缓冲区1中，缓冲区1的大小为10。

第三个echo语句会输出'barbazz'，它还是会先放到缓冲区2中，这个字符串有7个字节，缓冲区2已经溢出了，所以它会立即刷新，调用回调函数得到的结果为'Barbazz'，然后被传递到缓冲区1中。这个时候缓冲区1中保存了'FooBarbazz'，10个字符，缓冲区1会刷新，同样的先会调用ob_start()的回调函数，缓冲区1的回调函数会在字符串前面添加行号，以及在尾部添加一个回车符，所以输出的第一行是'o- FooBarbazz'。

最后一个echo语句输出了字符串'hello'，它大于3个字符，所以会触发缓冲区2刷新，因为此时脚本已执行完毕，所以也会立即刷新缓冲区1，最终得到的第二行输出为'1- Hello'。

输出缓冲区的内部实现

自5.4版后，整个缓冲区层都被重写了（由Michael Wallner完成）。之前的代码很垃圾，很多事情都做不了，并且有很多bug。这篇文章会给你提供更多相关信息。所以PHP 5.4才会对这部分进行重新，现在的设计更好，代码也更整洁，添加了一些新特性，跟5.3版的不兼容问题也很少。赞一个！

其中最赞的一个特性是扩展可以声明它自己的输出缓冲区回调与其他扩展提供的回调冲突。在此之前，这是不可能的，之前如果要开发使用输出缓冲区的扩展，必须先搞清楚所有其他提供了缓冲区回调的扩展可能带来的影响。

下面是一个简单的示例，它展示了怎样注册一个回调函数来将缓冲区中的字符转换为大写，这个示例的代码可能不是很好，但是足以满足我们的目的：

#ifdef HAVE_CONFIG_H #include "config.h"#endif #include "php.h"#include "php_ini.h"#include "main/php_output.h"#include "php_myext.h"static int myext_output_handler(void **nothing, php_output_context *output_context){ char *dup = NULL; dup = estrndup(output_context->in.data, output_context->in.used); php_strtoupper(dup, output_context->in.used); output_context->out.data = dup; output_context->out.used = output_context->in.used; output_context->out.free = 1; return SUCCESS; } PHP_RINIT_FUNCTION(myext) { php_output_handler *handler; handler = php_output_handler_create_internal("myext handler", sizeof("myext handler") -1, myext_output_handler, /* PHP_OUTPUT_HANDLER_DEFAULT_SIZE */ 128, PHP_OUTPUT_HANDLER_STDFLAGS); php_output_handler_start(handler); return SUCCESS; } zend_module_entry myext_module_entry = { STANDARD_MODULE_HEADER, "myext", NULL, /* Function entries */ NULL, NULL, /* Module shutdown */ PHP_RINIT(myext), /* Request init */ NULL, /* Request shutdown */ NULL, /* Module information */ "0.1", /* Replace with version number for your extension */ STANDARD_MODULE_PROPERTIES }; #ifdef COMPILE_DL_MYEXTZEND_GET_MODULE(myext)#endif

Copy after login

陷阱

大部分陷阱都已经揭示出来了。有一些是逻辑的问题，有一些是隐藏的。逻辑方面，最明显的是你不应该在输出缓冲区回调函数内调用任何缓冲区相关的函数，也不要在回调函数中输出任何东西。

相对不太明显的是有些PHP的内部函数也使用了输出缓冲区，它们会叠加到其他的缓冲区上，这些函数会填满自己的缓冲区然后刷新，或者是返回里面的内容。print_r()、highlight_file()和highlight_file::handle()都是这类函数。你不应该在输出缓冲区的回调函数中使用这些函数。这种行为会导致未定义的错误，或者至少得不到你期望的结果。

总结

输出层（output layer）就像一个网，它会把所有从PHP”遗漏“的输出圈起来，然后把它们保存到一个大小固定的缓冲区中。当缓冲区被填满了的时，里面的内容会刷新（写入）到下一层（如果有的话），或者是写入到下面的逻辑层：SAPI缓冲区。开发人员可以控制缓冲区的数量、大小以及在每个缓冲区层可以执行的操作（清除、刷新和删除）。这种方式非常灵活，它允许库和框架设计者可以完全控制它们自己输出的内容，并把它们放到一个全局的缓冲区中。对于输出，我们需要知道任何输出流的内容和任何HTTP消息头，PHP都会以正确的顺序发送它们。

The output buffer also has a default buffer, which can be controlled by setting three INI configuration options. They are to prevent a large number of small write operations from occurring, resulting in too frequent access to the SAPI layer, which will cause network consumption. Very large and not good for performance. PHP extensions can also define callback functions and then execute this callback on each buffer. There are already many applications for this, such as performing data compression, HTTP header management, and many other things.

Related recommendations:

Detailed explanation of how to speed up your site by refreshing the PHP buffer

Example of buffers in php Detailed explanation

phpIn-depth understanding of the usage of the refresh buffer function

The above is the detailed content of Detailed explanation of php's output buffer. For more information, please follow other related articles on the PHP Chinese website!