Concurrency IO The problem has always been a technical problem in server-side programming, from the earliest synchronization blocking direct Fork process to the Worker process pool /Thread pool, to the current asynchronous IO and coroutine. PHP programmers because of their powerful
The LAMP framework has little knowledge of such underlying aspects. The purpose of this article is to introduce in detail PHP's various attempts at concurrent IO programming, and finally introduce the use of Swoole to comprehensively analyze concurrent IO issues in simple and easy-to-understand terms.
The earliest server-side programs solved the problem of concurrencyIO through multi-process and multi-thread. The process model appeared earliest, and the concept of process has been around since the birth of the Unix system. The earliest server-side programs are generally Accept A process is created when a client connects, and then the child process enters the loop to synchronize blocking Interact with client connections and send and receive processing data.
Multi-threading mode appeared later. Threads are more lightweight than processes, and threads share memory stacks, so they are different. Interaction between threads is very easy to implement. For example, in a program like a chat room, client connections can interact with each other, and players in the chat room can send messages to any other person. It is very simple to implement in multi-thread mode. Data can be sent directly to a client connection in the thread. The multi-process mode requires the use of complex technologies such as pipelines, message queues, and shared memory, collectively referred to as inter-process communication (
IPC). Code example:
Multiple processes
/The process of thread model is
Create a socket and bind the server port (bind), listening port (listen), in PHPUse stream_socket_serverOne function can complete the above3 Of course, you can also use the lower-level extensions of sockets to implement them separately.
Enters the while loop, blocking at acceptIn operation, wait for the client connection to come in. At this time, the program will enter a sleep state until a new client initiates a connect to the server, and the operating system will wake up the process. acceptThe function returns the socket
The main process passes fork(#php: pcntl_fork) to create a child process, use pthread_create(php under the multi-threading model : new Thread) creates a child thread. Unless otherwise stated below, process will also be used to represent process / thread.
After the child process is successfully created, it enters the while loop, blocking at recv(php: fread) call, waiting for the client to send data to the server. After receiving the data, the server program processes it and then uses send(php: fwrite) sends a response to the client. A long-connection service will continue to interact with the client, while a short-connection service will generally close after receiving a response.
#When the client connection is closed, the child process exits and destroys all resources. The main process will recycle this child process.
The biggest problem with this model is that the process/thread creation and destruction The cost is huge. So the above model cannot be applied to very busy server programs. The corresponding improved version solves this problem. This is the classic Leader-Follower model.
Code example:
Its characteristic is that it will be created after the program startsN processes. Each child process enters Accept and waits for new connections to enter. When the client connects to the server, one of the child processes will be awakened, start processing the client request, and no longer accept new TCP connections. When this connection is closed, the child process will be released and re-enter Accept to participate in processing new connections.
The advantage of this model is that it can completely reuse the process, without additional consumption, and the performance is very good. Many common server programs are based on this model, such as Apache , PHP-FPM.
The multi-process model also has some disadvantages.
This model relies heavily on the number of processes to solve concurrency problems. A client connection needs to occupy one process, how many worker processes are there, and concurrent processing capabilities. There are as many as there are. The operating system is limited in the number of processes it can create.
#Starting a large number of processes will bring additional process scheduling consumption. When there are hundreds of processes, the process context switching scheduling consumption may account for less than CPU1% You can ignore it. If you start thousands or even tens of thousands of processes, the consumption will skyrocket. Scheduling consumption may account for tens of percent of CPU or even 100%.
#There are also some scenarios that the multi-process model cannot solve, such as instant chat programs (IM), a server must maintain tens of thousands or even hundreds of thousands or millions of connections at the same time (the classic C10K problem), the multi-process model is It’s beyond my capabilities.
There is another scenario that is also the weakness of the multi-process model. Typically Web the server starts 100 processes if one request consumes 100ms, 100 processes can provide 1000qps, this processing capability is pretty good. But if the request requires calling the external network Http interface, like QQ, Weibo login will take a long time, one request takes 10s. That process can only handle 0.1 requests in 1 seconds, 100 processes can only reach 10qps. This processing capability is too poor. .
Is there a technology that can handle all concurrent IO in one process? The answer is yes, this is IO multiplexing technology.
In factIOThe history of reuse As long as multi-process, Linux has long provided the select system Calling can maintain 1024 connections within a process. Later, the poll system call was added, and poll made some improvements. Solved the problem of 1024 and can maintain any number of connections. But select/poll Another problem is that it needs to loop to detect whether there are events on the connection. The problem arises. If the server has 100 million connections, and only one connection sends data to the server at a certain time, select/pollNeeds to loop 100 million times, of which only 1 times are hits, and the remaining 9910,0009999 times are invalid, wasting CPU resources.
UntilLinux 2.6 the kernel provides the new epollSystem call can maintain an unlimited number of connections without polling, which truly solves the C10K problem. Nowadays, various high-concurrency asynchronous IO server programs are based on epoll Implemented, such as Nginx, Node.js, Erlang, Golang. A single-process, single-threaded program like Node.js can last more than 1 Millions##TCP connections, all thanks to epoll technology.
##IOReuse asynchronous non-blocking programs using the classic Reactor Model, Reactor As the name suggests, it means reactor. It does not process any data sending and receiving. You can only monitor the event changes of a socket handle.
##Reactor有4 core operations:
addAddsocketListen to reactor, can be listen socket can also make the clientsocket, or it can be a pipe, eventfd , signals, etc.
set#Modify event monitoring, you can set the monitoring Type, such as readable, writable. Readable and easy to understand, for listen socket means that a new client connection has arrived and requires accept. For client connections to receive data, recv is required. Writable events are a bit more difficult to understand. A SOCKET has a cache area. If you want to connect to the client, send 2M data cannot be sent out at one time. The operating system default TCP cache area only has 256K. Only 256K can be sent at one time. Once the buffer is full, send will Will return EAGAIN error. At this time, you need to monitor writable events. In pure asynchronous programming, you must monitor writable events to ensure that the send operation is completely non-blocking.
##delfromreactorRemove from, no longer listen to events
callback is the corresponding processing logic after the event occurs, usually in add/set## Formulated when #. CThe language is implemented with function pointers, JSYou can use anonymous functions, PHPYou can use anonymous functions, object method arrays, and string function names.
##Reactor is just an event generator, actually for socket handle operations, such as connect/accept, send/recv#、close is at callback Completed in . For specific coding, please refer to the following pseudo code:
##ReactorThe model can also be used with multi-process, The combination of multiple threads not only achieves asynchronous non-blocking IO, but also takes advantage of multiple cores. The current popular asynchronous server programs are all in this way: such as
: multi-process Reactor
: Multi-processReactor+Coroutine
:Single ThreadReactor+Multi-threaded coroutine
Swoole:Multi-threadingReactor+Multi-processWorker
Coroutine from the perspective of underlying technology In fact, it is still an asynchronous IO Reactor model. The application layer implements task scheduling on its own and uses Reactor to switch each currently executing user-mode thread, but the existence of Reactor is completely invisible in the user code.
Stream:PHP# provided by the kernel #socketEncapsulation
: To the bottom layerSocket Encapsulation of API
: RightlibeventLibrary encapsulation
: Based on LibeventMore advanced encapsulation, providing support for object-oriented interfaces, timers, and signal processing
: Support for multiple processes, signals, and process management
: Multi-threading, thread management, lock support
##PHPThere are also related extensions for shared memory, semaphores, and message queues
PECL: The extension library of PHP, including It covers the bottom layer of the system, data analysis, algorithms, drivers, scientific computing, graphics, etc. If PHP is not found in the standard library, you can find what you want in PECL function.
## Advantages of PHP:
The first one is simplicity,PHP It is simpler than any other language. If you want to get started, PHP can really be started in a week. C++There is a book called "21Days of Deep LearningC++》, in fact, it is impossible to learn it in 21##, it can even be said that C++It is impossible to master it deeply without 3-5 years. But PHP can definitely get started in 7 days. So PHPThe number of programmers is very large and recruiting is easier than for other languages.
PHP is very powerful because PHP The official standard library and extension library provide 99% things that can be used for server programming. PHP#PECL extension library has any functionality you want.
In additionPHPhas more than 20 years of history, the ecosystem is very large, and you can find a lot of code in Github.
PHP Disadvantages:
The performance is relatively poor, because it is a dynamic script after all and is not suitable for intensive operations. If the same PHP program is used C/C++ is written in , the PHP version is one step behind it hundred times.
The function naming convention is poor, everyone knows this, PHP It pays more attention to practicality and does not have some regulations. The naming of some functions is very confusing, so you have to go to the manual of PHP every time.
The interface granularity of the data structures and functions provided is relatively coarse. PHPThere is only one Array data structure, the bottom layer is based on HashTable. PHP’s Arraycollected Map,Set,Vector ,Queue,Stack, Functions of data structures such as Heap. In addition, PHP has a SPL that provides class encapsulation of other data structures.
SoPHP
## PHP is more suitable for programs at the practical application level, a tool for business development and rapid implementation
PHP is not suitable Develop underlying software
usingC/C++、JAVA, Golang and other static compiled languages serve as a supplement to PHP, combining dynamic and static
Use IDE tools to achieve automatic completion and grammar prompts
is based on the above extension using pure PHP You can fully implement asynchronous network server and client programs. But if you want to implement a multi-IO thread, there is still a lot of tedious programming work to do, including how to manage connections and how to ensure the sending and receiving of data. Atomicity,processing of network protocols. In additionPHPThe performance of the code in the protocol processing part is relatively poor, so I started a new open source projectSwoole , using C language and PHP The combination did the job. Flexible and changeable business modules use PHP# for high development efficiency, and the basic bottom layer and protocol processing parts use C Language implementation ensures high performance. It is loaded into PHP in an extended manner, providing a complete network communication framework, and thenPHP code to write some business. Its model is based on multi-threadingReactor+multi-processWorker, which supports both Fully asynchronous, also supports semi-asynchronous and semi-synchronous. Some features of
AcceptThreads to solveAcceptPerformance bottlenecks and thundering herd problems
MultipleIO Threads can better utilize multi-core
Provides fully asynchronous and semi-synchronous and semi-asynchronous2 modes
Handle high concurrencyIOThe part uses asynchronous mode
The complex business logic part uses Synchronous mode
The bottom layer supports traversing all connections, sending data to each other, automatically merging and splitting data packets, and atomically sending data.
Use
https://github.com/swoole/swoole-src Homepage Check.
AsynchronousTCPServer:
herenew swoole_server object, then The parameters are passed in to the monitored HOST and PORT, and then ## is set. #3 callback functions, respectively onConnectwhen a new connection enters,onReceiveReceived data from a certain client, onCloseA certain client has closed connect. Finally call start to start the server program. swooleThe bottom layer will start the corresponding number based on how many CPU cores the current machine has. Number of Reactor threads and Worker processes. Asynchronous client: The usage of the client is similar to that of the server except that there are callback events4, onConnect successfully connected to the server, then you can send data to server. onErrorFailed to connect to the server. onReceiveThe server sent data to the client connection. onCloseThe connection is closed. After setting the event callback, initiate a connect to the server. The parameter is the server's IP,PORT and timeout. Synchronization client: Synchronization The client does not need to set any event callbacks. It does not have Reactor listening and is serial blocking. Wait for IO to complete before proceeding to the next step. Asynchronous tasks: The asynchronous task function is used to execute a time-consuming or blocking function in a purely asynchronous Server program. The underlying implementation uses a process pool. After the task is completed, onFinish will be triggered, and the results of task processing can be obtained in the program. For example, an IM needs to be broadcast. If broadcast directly in asynchronous code, it may affect the processing of other events. In addition, file reading and writing can also be implemented using asynchronous tasks, because the file handle cannot be used like socketReactormonitor. Because the file handle is always readable, reading the file directly may block the server program. Using asynchronous tasks is a very good choice. Asynchronous millisecond timer ##This 2 interface implements setInterval## similar to JS #、setTimeout function can be set in n Implement a function in millisecond intervals or execute a function after n milliseconds. AsynchronousMySQLClient swoole also provides a built-in connection pool MySQL asynchronous client, you can Set the maximum number of MySQL connections used. ConcurrentSQL requests can reuse these connections instead of creating them repeatedly, which protectsMySQLAvoid connection resources being exhausted. AsynchronousRedisClient ##AsynchronousWeb Program Redis# Read a data in ##, and then display the HTML page. Using ab the stress test performance is as follows: The same logic is The performance test results under php-fpm are as follows: WebSocketProgram swoole has a built-in websocket server, which can be implemented based on this Web The function of active page push, such as WebIM. There is an open source project that can be used as a reference. https://github.com/matyhtf/php-webim Asynchronous programming generally uses callbacks. If you encounter very complex logic, callback functions may be nested layer by layer. Coroutines can solve this problem. Code can be written sequentially, but the runtime is asynchronous and non-blocking. Tencent engineers based on the Swoole extension and the of PHP5.5 Yield/GeneratorThe syntax implements a coroutine similar to Golang, the project name is TSF(Tencent
Server Framework), open source project address: https://github.com/tencent-php/tsf. Currently in Tencent's corporate QQ, QQ public account projects and wheels Ignored items for checking traffic violations have been applied on a large scale
. TSF is also very simple to use. The following call 3IO operations are completely serial. But it is actually executed asynchronously and non-blockingly. TSFThe underlying scheduler takes over the execution of the program, in the corresponding IO After completion, execution will continue downward. ##PHP+SwooleCoroutine
The above is the detailed content of Detailed explanation of PHP concurrent IO programming. For more information, please follow other related articles on the PHP Chinese website!