Several concepts you must know in Workerman-Workerman-php.cn

Several concepts you must know in Workerman

尚

Release： 2019-11-26 16:35:55

forward

3567 people have browsed it

The following column workerman usage tutorial will introduce you to several concepts you must know about Workerman. I hope it will be helpful to friends in need!

Several concepts you must know in Workerman

#Workerman is an open source, high-performance PHP socket service framework developed purely in PHP. It is not an MVC framework, but a lower-level and more general socket service framework. You can use it to develop tcp agents, ladder agents, game servers, mail servers, and ftp servers.

Recommended: Workerman tutorial

In fact, Workerman is similar to a PHP version of nginx, and the core is also multi-process Epoll non-blocking IO. Each Workerman process can maintain tens of thousands of concurrent connections. Because it resides in memory, it does not rely on containers such as Apache, nginx, and php-fpm, and has ultra-high performance.

Supports TCP, UDP, UNIXSOCKET, long connections, Websocket, HTTP, WSS, HTTPS and other communication protocols as well as various custom protocols. It has many high-performance components such as timers, asynchronous socket clients, asynchronous Mysql, asynchronous Redis, asynchronous HTTP, and asynchronous message queues.

First of all, you need to understand a few core concepts, 1. Multi-process 2. Epoll 3. Non-blocking IO

1. Multi-process:

First of all, what is a process?, a process includes code, data and resources (memory) allocated to the process. Intuitively speaking, a process is a PID in a computer system. The operating system protects the process space from interference by external processes, that is, one process cannot access the memory of another process.

Sometimes there is a need to communicate between processes. In this case, the operating system can be used to provide an inter-process communication mechanism. Normally, when you execute an executable file the operating system creates a process for it to run.

But if the execution file is based on a multi-process design, the operating system will create multiple processes on the original process. The code executed between these processes is the same, but the execution results may be the same. It may be different.

Why do we need multiple processes? The most intuitive idea is that if the operating system supports multi-core, then an executable file can run on different cores; even if it is non-multi-core, while one process is waiting for I/O operations, another process can also run on the CPU. Run on it to improve CPU utilization and program efficiency.

On Linux systems, you can use fork() to create a child process in the parent process. After a process calls fork(), the system will first allocate resources to the new process, such as storage data and code space. Then all the values and status of the original process are copied to the new process. Only a few values are different from the original process to distinguish different processes.

The fork() function will return twice, once to the parent process (returning the pid of the child process or fork failure information), and once to the child process (returning 0). At this point, the two processes parted ways and each ran in the system.

2. Non-blocking IO:

First of all, what is IO, that is, the operation of input and output. The essence of network IO is the reading of socket. Socket is abstracted as a stream in Linux system, and IO can be understood as a convection operation. For an IO access (take read as an example), the data will first be copied to the buffer of the operating system kernel, and then copied from the buffer of the operating system kernel to the address space of the application program.

So, when a read operation occurs, it will go through two stages:

The first stage (waiting for data): Waiting for the data to be ready (Waiting for the data to be ready).

Second stage (copying data): Copying the data from the kernel to the process(Copying the data from the kernel to the process)

For socket stream (i.e. IO),

The first step: usually involves waiting for a data packet to arrive on the network and then be copied to some buffer in the kernel.

Step 2: Copy data from the kernel buffer to the application process buffer.

The network IO models are roughly as follows:

Synchronous model (synchronous IO)

Blocking IO (bloking IO) resource is unavailable, the IO request is blocked until the feedback result (data or timeout). In Linux, all sockets are blocked by default. The characteristic of blocking IO is that they are blocked in both stages of IO execution (waiting for data and copying data).

When non-blocking IO (non-blocking IO) resources are unavailable, the IO request leaves and returns, and the return data identifies that the resource is unavailable. In Linux, if the data is not ready, it will not block the user process, and the kernel will immediately return to the process, indicating that this command cannot be satisfied immediately (EAGAIN or EWOULDBLOCK). Therefore, non-blocking is achieved using polling.

Multiplexing IO (multiplexing IO) IO multiplexing is what we call select, poll, and epoll. In some places, this IO method is also called event driven IO. The advantage of select/epoll is that a single process can handle the IO of multiple network connections at the same time.

Its basic principle is that the function of select, poll, and epoll will continuously poll all the sockets it is responsible for. When data arrives in a certain socket, the user process will be notified. In the IO multiplexing Model, in practice, each socket is generally set to non-blocking.

However, the entire user's process is actually blocked all the time. It's just that the process is blocked by the select function instead of being blocked by socket IO. Therefore, IO multiplexing is blocked on system calls such as select and epoll, but not on real I/O system calls such as recvfrom.

Signal-driven IO (signal-driven IO)

Asynchronous IO (asynchronous IO) After the user process initiates the read operation, it can immediately start doing other things . On the other hand, from the kernel's perspective, when it receives an asynchronous read, it will return immediately, so it will not cause any block to the user process.

Then, the kernel will wait for the data preparation to be completed, and then copy the data to the user memory. When all this is completed, the kernel will send a signal to the user process to tell it that the read operation is completed.

3. Epoll : epoll is easy to understand now. epoll is an improved poll made by the Linux kernel to handle large batches of file descriptors. It is a multiplexed IO under Linux. An enhanced version of the interface select/poll, which can significantly improve the system CPU utilization of the program when only a small number of active connections are active among a large number of concurrent connections.

PS. Several points to note:

1: Is IO multiplexing a synchronous blocking model or an asynchronous blocking model?

Synchronization requires actively waiting for message notifications, while asynchronous requires passively receiving message notifications and passively obtaining messages through callbacks, notifications, status, etc. When IO multiplexing blocks to the select stage, the user process actively waits and calls the select function to obtain the data ready status message, and its process status is blocked. Therefore, IO multiplexing is classified as synchronous blocking mode.

2: What is concurrency? What is the state of high concurrency?

Highly concurrent programs generally use the synchronous non-blocking method rather than the multi-thread synchronous blocking method. To understand this, first look at the difference between concurrency and parallelism. That is to say, the number of concurrency refers to the number of tasks being performed at the same time (such as HTTP requests being served at the same time), while the number of parallelism is the number of physical resources that can work at the same time (such as the number of CPU cores).

By properly scheduling different stages of tasks, the number of concurrencies can be far greater than the degree of parallelism. This is the secret why a few CPUs can support tens of thousands of user concurrent requests. In this high concurrency situation, creating a process or thread for each task (user request) is very expensive. The synchronous non-blocking method can throw multiple IO requests to the background, which can serve a large number of concurrent IO requests in one process.

The above is the detailed content of Several concepts you must know in Workerman. For more information, please follow other related articles on the PHP Chinese website!