Let's talk about processes, threads, coroutines and concurrency models in Node.js-JS Tutorial-php.cn

Let's talk about processes, threads, coroutines and concurrency models in Node.js

Node.js Now it has become a member of the toolbox for building high-concurrency network application services. Why has Node.js become the darling of the public? This article will start with the basic concepts of processes, threads, coroutines, and I/O models, and give you a comprehensive introduction to Node.js and the concurrency model.

Process

We generally call the running instance of a program a process, which is a basic unit for resource allocation and scheduling by the operating system. , generally includes the following parts:

Program: the code to be executed, used to describe the functions to be completed by the process;
Data area: the data space processed by the process, including Data, dynamically allocated memory, user stack of processing functions, modifiable programs and other information;
Process table items: In order to implement the process model, the operating system maintains a process called process table table, each process occupies a process table entry (also called process control block ), which contains the program counter, stack pointer, memory allocation, and open files status, scheduling information and other important process status information to ensure that after the process is suspended, the operating system can correctly revive the process.

The process has the following characteristics:

Dynamicity: The essence of the process is an execution process of the program in the multi-programming system. The process is dynamically generated and dynamically destroyed. ;
Concurrency: Any process can be executed concurrently with other processes;
Independence: A process is a basic unit that can run independently, and it is also an independent unit for system allocation and scheduling of resources. ;
Asynchronicity: Due to the mutual constraints between processes, the process has intermittent execution, that is, the processes move forward at independent and unpredictable speeds.

It should be noted that if a program is run twice, even if the operating system can enable them to share code (that is, only one copy of the code is in memory), it cannot change the running program. The fact that the two instances are two different processes.

During the execution of the process, due to various reasons such as interruptions and CPU scheduling, the process will switch between the following states:

Lets talk about processes, threads, coroutines and concurrency models in Node.js

Running state: The process is running at this moment and occupying the CPU;
Ready state: The process is ready at this moment and can be run at any time, but it is temporarily stopped because other processes are running;
Blocked state: The process is in a blocked state at this moment. Unless an external event occurs (such as keyboard input data has arrived), the process will not be able to run.

As can be seen from the process state switching diagram above, the process can switch from the running state to the ready state and the blocking state, but only the ready state can be directly switched to the running state. This is because:

The switch from running state to ready state is caused by the process scheduler, because the system believes that the current process has taken up too much CPU time and decides to let other processes use the CPU time; and the process scheduler is the operating system's In part, the process does not even feel the existence of the scheduler;
The switch from the running state to the blocking state is due to the process's own reasons (such as waiting for the user's keyboard input). The process cannot continue to execute and can only hang and wait for something. An event (such as keyboard input data has arrived) occurs; when a related event occurs, the process first converts to the ready state. If no other process is running at this time, it immediately converts to the running state. Otherwise, the process will remain in the ready state and wait for the process. Scheduling by the scheduler.

Threads

Sometimes, we need to use threads to solve the following problems:

As the number of processes increases, The cost of switching between processes will become higher and higher, and the effective utilization rate of the CPU will become lower and lower. In severe cases, it may cause the system to freeze and other phenomena;
Each process has its own independent memory space , and the memory space between each process is isolated from each other, and some tasks may need to share some data, so data synchronization between multiple processes is too cumbersome.

Regarding threads, we need to know the following points:

A thread is a single sequential control flow in program execution, and is the smallest unit that the operating system can perform calculation scheduling. , it is included in the process and is the actual running unit in the process;
A process can contain multiple threads, each thread performs different tasks in parallel;
In a process All threads share the process's memory space (including code, data, heap, etc.) and some resource information (such as open files and system signals);
Threads in one process are not visible in other processes.

Now that we understand the basic characteristics of threads, let’s talk about some common thread types.

Kernel state thread

Kernel state thread is a thread directly supported by the operating system. Its main features are as follows:

The creation, scheduling, synchronization, and destruction of threads are completed by the system kernel, but its overhead is relatively expensive;
The kernel can map kernel-state threads to each processor, making it easy to The processor core corresponds to a kernel thread, thereby fully competing for and utilizing CPU resources;
can only access the code and data of the kernel;
The efficiency of resource synchronization and data sharing is lower than that of process resource synchronization and data sharing efficiency.

User-mode thread

User-mode thread is a thread completely built in user space. Its main characteristics are as follows:

The creation, scheduling, synchronization, and destruction of threads are completed by user space, and its overhead is very low;
Since user-mode threads are maintained by user-space, the kernel does not perceive the existence of user-mode threads at all, so the kernel only The process to which it belongs does scheduling and resource allocation, and the scheduling and resource allocation of threads in the process are handled by the program itself. This is likely to cause a user-mode thread to be blocked in a system call, and the entire process will be blocked;
Ability to access all shared address spaces and system resources of the process to which it belongs;
Resource synchronization and data sharing are more efficient.

Lightweight process (LWP)

Lightweight process (LWP) is a user thread built on and supported by the kernel. The main features are as follows:

User space can only use kernel threads through lightweight processes (LWP), which can be regarded as a bridge between user state threads and kernel threads. Therefore, only the first Only by supporting kernel threads can there be a lightweight process (LWP);
Most operations of a lightweight process (LWP) require the user mode space to initiate a system call. This system call The cost is relatively high (requiring switching between user mode and kernel mode);
Each lightweight process (LWP) needs to be associated with a specific kernel thread, Therefore:
- Like kernel threads, it can fully compete and utilize CPU resources throughout the system;
- Each lightweight process (LWP) is an independent Thread scheduling unit, so that even if a lightweight process (LWP) is blocked in a system call, it will not affect the execution of the entire process;
- Lightweight process (LWP) needs to consume kernel resources (mainly refers to The stack space of the kernel thread), which makes it impossible to support a large number of lightweight processes (LWP) in the system;
can access all shared address spaces and systems of the processes to which they belong. resource.

Summary

Above we have briefly discussed the common thread types (kernel state threads, user state threads, lightweight processes) Introduction, each of them has its own scope of application. In actual use, they can be freely combined and used according to their own needs, such as common one-to-one, many-to-one, many-to-many and other models. Due to space limitations, this article I won’t introduce too much about this, and interested students can do their own research.

Coroutine

Coroutine, also called Fiber, is a type of thread that is built on threads and is managed and scheduled by the developer. , state maintenance and other behaviors, its main features are:

Because execution scheduling does not require context switching, it has good execution efficiency;
Because it runs in the same Threads, so there is no synchronization problem in thread communication;
facilitates switching of control flow and simplifies the programming model.

In JavaScript, the async/await we often use is an implementation of coroutine, such as the following example:

function updateUserName(id, name) {
  const user = getUserById(id);
  user.updateName(name);
  return true;
}

async function updateUserNameAsync(id, name) {
  const user = await getUserById(id);
  await user.updateName(name);
  return true;
}

Copy after login

Above example , the logical execution sequence within functions updateUserName and updateUserNameAsync is:

Call function getUserById and assign its return value to Variable user;
Call the updateName method of user;
returns true to the caller .

The main difference between the two lies in the state control during actual operation:

During the execution of function updateUserName, as mentioned above The above logical sequence is executed in sequence;
During the execution of function updateUserNameAsync, it is also executed in sequence according to the logical sequence mentioned above, except that when await is encountered When, updateUserNameAsync will be suspended and save the current program state at the suspended location. It will not be awakened again until await the subsequent program fragment returns. updateUserNameAsync And restore the program state before suspending, and then continue to execute the next program.

Through the above analysis, we can boldly guess: what coroutines need to solve is not the program concurrency problems that processes and threads need to solve, but the problems encountered when processing asynchronous tasks (such as File operations, network requests, etc.); before async/await, we could only handle asynchronous tasks through callback functions, which could easily make us fall into callback hell and produce a mess of Code that is generally difficult to maintain can be achieved through coroutines to synchronize asynchronous code.

What needs to be kept in mind is that the core capability of the coroutine is to be able to suspend a certain program and maintain the state of the suspended position of the program, and resume at the suspended position at some time in the future, and continue to execute the suspended position. the next program.

I/O model

A complete I/O operation needs to go through the following stages:

User The thread (thread) initiates an I/O operation request to the kernel through a system call; the
kernel processes the I/O operation request (divided into a preparation phase and Actual execution stage), and returns the processing results to the user thread.

We can roughly divide I/O operations into blocking I/O, non-blocking I/O, Synchronous I/O, Asynchronous I/O Four types. Before discussing these types, we first become familiar with the following two sets of concepts (assuming here that service A calls service B):

##Blocking/non-blocking:
- Blocking call;
- non-blocking call.
Synchronous/asynchronous:
- Synchronization;
- callback after execution. To A, then service B is asynchronous.

Many people often confuse

blocking/non-blocking with synchronous/asynchronous, so special attention is required:

Blocking/non-blockingFor the caller of the service;
Synchronous/asynchronousFor the service As far as the callee is concerned.

Understanding

blocking/non-blocking and synchronous/asynchronous, let’s look at the specific I/O model.

Blocking I/O

Definition: After the user enters the (thread) process and initiates the

I/O system call, the user enters the (thread) process Will be immediately blocked until the entire I/O operation is processed and the result is returned to the user (thread) process, the user (thread) process can be unblocked status, continue to perform subsequent operations.

Features:

When executing
I/ During O operation, the user cannot perform other operations in the (thread) process;
I/O request can block the incoming (thread) process, so in order to respond to I/O requests in a timely manner, it is necessary to allocate an incoming (thread) process to each request. This will cause huge resource usage, and for long connections In terms of requests, since the incoming (thread) process resources cannot be released for a long time, if there are new requests in the future, a serious performance bottleneck will occur.

Non-blocking I/O

Definition:

I/O After the system call, if the I/O operation is not ready, the I/O call will return an error, and the user does not need to wait when entering the thread. Instead, polling is used to detect whether the I/O operation is ready; after the
I/O operation will block the user from entering ( Thread (thread) until the execution result is returned to the user (thread).

Features:

I/O operation readiness status (generally use while loop), so the model needs to occupy the CPU and consume CPU resources;
I/O operation is ready, the user's (thread) process will not be blocked until I/O After the operation is ready, subsequent actual I/O operations will block the user from entering the thread;

Synchronous (asynchronous) I/O

The user thread initiates

I/O After the system call, if The I/O call will cause the user's thread to be blocked, then the I/O call will be synchronous I/O, otherwise it will be AsynchronousI/O.

The criterion for judging

I/O operation synchronous or asynchronous is the user's thread (thread) connection with I/O Communication mechanism for operations, where:

SynchronizationIn this case, the interaction between the user thread and I/O is synchronized through the kernel buffer, that is, the kernel will /O The execution result of the operation is synchronized to the buffer, and then the data in the buffer is copied to the user thread. This process will block the user thread until I/O The operation is completed;
asynchronousIn the case of user thread (thread) interaction with I/O is directly synchronized through the kernel, that is, the kernel will directly I/O The execution result of the operation is copied to the user's thread. This process will not block the user's thread.

The concurrency model of Node.js

Node.js uses a single-threaded, event-driven asynchronous

I/O model, I personally believe that the reason for choosing this model is:

Most networks Applications are
I/O intensive. How to manage multi-threaded resources reasonably and efficiently while ensuring high concurrency is more complicated than the management of single-threaded resources.

In short, for the purpose of simplicity and efficiency, Node.js adopts a single-threaded, event-driven asynchronous

I/O model, and uses the main thread's EventLoop and Auxiliary Worker thread to implement its model:

Worker thread is used to execute specific event tasks (executed synchronously in other threads other than the main thread), and then returns the execution results to the EventLoop of the main thread, so that EventLoop executes callback functions for related events.

It should be noted that Node.js is not suitable for performing CPU-intensive (that is, requiring a lot of calculations) tasks; this is because EventLoop runs in the same thread as JavaScript code (non-asynchronous event task code) (i.e., the main thread). If any one of them runs for too long, it may cause the main thread to block. If the application contains a large number of tasks that require long execution, it will reduce the throughput of the server and may even cause the server to become unresponsive. .

Summary

Node.js is a technology that front-end developers have to face now and even in the future. However, most front-end developers only know about Node.js. Staying on the surface, in order to let everyone better understand the concurrency model of Node.js, this article first introduces processes, threads, and coroutines, then introduces different

I/Omodels, and finally introduces Node.js The concurrency model is briefly introduced. Although introduced There is not much space on the Node.js concurrency model, but I believe that it will never change without departing from its roots. Once you master the relevant basics and then deeply understand the design and implementation of Node.js, you will get twice the result with half the effort.

Finally, if there are any mistakes in this article, I hope you can correct them. I wish you all happy coding every day.

For more node-related knowledge, please visit:

nodejs tutorial!

The above is the detailed content of Let's talk about processes, threads, coroutines and concurrency models in Node.js. For more information, please follow other related articles on the PHP Chinese website!