Analysis of child_process module and cluster module in node.js (code example)-JS Tutorial-php.cn

What this article brings to you is the analysis (code examples) of the child_process module and cluster module in node.js. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

Node follows a single-threaded single-process model. The single-thread of node means that the js engine has only one instance and is executed in the main thread of nodejs. At the same time, node handles asynchronous operations such as IO in an event-driven manner. The single-threaded mode of node only maintains one main thread, which greatly reduces the cost of switching between threads.

However, the single thread of node prevents CPU-intensive operations from being performed on the main thread, otherwise the main thread will be blocked. For CPU-intensive operations, independent child processes can be created in node through child_process. The parent and child processes communicate through IPC. The child process can be an external application or a node subprogram. After the child process is executed, the results can be returned to the parent process.

In addition, node's single thread runs as a single process, so it cannot utilize multi-core CPUs and other resources. In order to schedule multi-core CPU and other resources, node also provides a cluster module to utilize the resources of multi-core CPUs, making it possible to A series of node sub-processes handle load tasks while ensuring a certain load balancing. This article starts from the understanding of node's single thread and single process, and introduces the child_process module and cluster module

1. Single thread and single process in node

The first concept to understand is that node Single-threaded and single-process modes. Compared with the multi-threaded modes of other languages, node's single thread reduces the cost of switching between threads, and there is no need to consider locks and thread pool issues when writing node code. The single-threaded mode declared by node is more suitable for IO-intensive operations than other languages. So a classic question is: Is

node really single-threaded?

When it comes to node, we can immediately think of words such as single-threaded, asynchronous IO, and event-driven. The first thing to clarify is whether node is really single-threaded. If it is single-threaded, then where are asynchronous IO and scheduled events (setTimeout, setInterval, etc.) executed.

Strictly speaking, node is not single-threaded. There are many kinds of threads in node, including:

js engine execution thread
Timer thread (setTimeout, setInterval)
Asynchronous http thread (ajax)

....

What we usually call single thread means that there is only one in the node js engine runs on the main thread. Other asynchronous IO and event-driven related threads use libuv to implement internal thread pools and thread scheduling. There is an Event Loop in libv, and switching through Event Loop can achieve an effect similar to multi-threading. Simply put, Event Loop maintains an execution stack and an event queue. If asynchronous IO and timer functions are found in the current execution stack, these asynchronous callback functions will be put into the event queue. After the execution of the current execution stack is completed, the asynchronous callback functions in the event queue are executed in a certain order from the event queue.

Analysis of child_process module and cluster module in node.js (code example)

In the above figure, from the execution stack to the event queue, the callback functions are executed in a certain order in the event queue. The whole process is a simplified version of Event Loop. In addition, when the callback function is executed, an execution stack will also be generated. Asynchronous functions may be nested inside the callback function, which means that the execution stack is nested.

In other words, the single thread in node means that the js engine only runs on the only main thread. Other asynchronous operations also have independent threads to execute. Through libv's Event Loop, a similar multi-thread is implemented. Thread context switching and thread pool scheduling. Threads are the smallest processes, so node is also a single process. This explains why node is single-threaded and single-process.

2. The child_process module in node implements multi-process

Since node is a single process, there must be a problem, that is, it cannot fully utilize resources such as CPU. Node provides the child_process module to implement child processes, thereby realizing a multi-process model in a broad sense. Through the child_process module, the mode of one main process and multiple sub-processes can be realized. The main process is called the master process, and the sub-process is also called the working process. In the sub-process, you can not only call other node programs, but also execute non-node programs and shell commands, etc. After executing the sub-process, it returns in the form of a stream or callback.

1. API provided by child_process module

child_process provides 4 methods for creating new child processes. These 4 methods are spawn, execFile, exec and fork. All methods are asynchronous, and a picture can be used to describe the differences between these four methods.

Analysis of child_process module and cluster module in node.js (code example)

The above picture can show the differences between these four methods. We can also briefly introduce the differences between these four methods.

spawn: A non-node program is executed in the child process. After a set of parameters is provided, the execution result is returned in the form of a stream.
execFile: The child process executes a non-node program. After a set of parameters is provided, the execution result is returned in the form of a callback.
exec: The child process executes a non-node program, passing in a string of shell commands, and the result is returned in the form of a callback after execution, which is different from execFile
The thing is that exec can directly execute a series of shell commands.
fork: The child process executes the node program. After providing a set of parameters, the execution result is returned in the form of a stream. Unlike spawn, the fork generates Child processes can only execute node applications. The following sections will introduce these methods in detail.

2. execFile and exec

Let’s first compare the difference between execFile and exec. The similarities between these two methods:

A non-node application is executed, and the execution result is returned in the form of a callback function.

The difference is:

exec is a shell command that is directly executed, while execFile is an application that is executed

For example, echo is a built-in command of the UNIX system. We can directly execute it on the command line:

echo hello world

Copy after login

As a result, hello world.

will be printed on the command line.

(1) Implemented through exec

Create a new main.js file. If you want to use the exec method, then write in the file:

let cp=require('child_process');
cp.exec('echo hello world',function(err,stdout){
  console.log(stdout);
});

Copy after login

Execute this main. js, the result will be hello world. We found that the first parameter of exec is completely similar to the shell command.

(2) Implemented through execFile

let cp=require('child_process');
cp.execFile('echo',['hello','world'],function(err,stdout){
   console.log(stdout);
});

Copy after login

execFile is similar to executing an application named echo and then passing in parameters. execFlie will search for an application named 'echo' in the path of process.env.PATH and execute it after finding it. The default process.env.PATH path contains 'usr/local/bin', and this program named 'echo' exists in the 'usr/local/bin' directory, passing in the two parameters hello and world. , returns after execution.

(3) Security analysis

Like exec, it is extremely unsafe to directly execute a shell. For example, there is such a shell:

echo hello world;rm -rf

Copy after login

It is possible to use exec If executed directly, rm -rf will delete the files in the current directory. exec is just like the command line, the execution level is very high, and security issues will arise after execution. However, execFile is different:

execFile('echo',['hello','world',';rm -rf'])

Copy after login

When the parameters are passed in, the security of the execution of the passed actual parameters will be detected. If there is a security issue, an exception will be thrown. In addition to execFile, spawn and fork cannot directly execute the shell, so they are more secure.

3. spawn

spawn is also used to execute non-node applications and cannot directly execute the shell. Compared with execFile, the result of spawn execution of the application is not a one-time result after the execution is completed. The output is in the form of a stream. For large batches of data output, the use of memory can be introduced in the form of streams.

We use the sorting and deduplication of a file as an example:

Analysis of child_process module and cluster module in node.js (code example)

In the above picture diagram, the input is read first. There are acba unsorted text in the txt file. The sorting function can be implemented through the sort program, and the output is aabc. Finally, the uniq program can be used to remove duplicates and obtain abc. We can use spawn stream input and output to achieve the above functions:

let cp=require('child_process');
let cat=cp.spawn('cat',['input.txt']);
let sort=cp.spawn('sort');
let uniq=cp.spawn('uniq');

cat.stdout.pipe(sort.stdin);
sort.stdout.pipe(uniq.stdin);
uniq.stdout.pipe(process.stdout);
console.log(process.stdout);

Copy after login

After execution, the final result will be input into process.stdout. If the input.txt file is large, input and output in the form of streams can significantly reduce memory usage. By setting a buffer, memory usage can be reduced while improving input and output efficiency.

4. fork

In JavaScript, in terms of processing a large number of calculation tasks, HTML is implemented through web work, which makes the task separate from the main thread. A built-in communication between the parent process and the child process is used in node to handle this problem, reducing the pressure on big data operations. The fork method is provided in node. Through the fork method, the node program is executed in a separate process, and through communication between father and son, the child process accepts the information of the parent process and returns the execution results to the parent process.

Using the fork method, an IPC channel can be opened between the parent process and the child process, so that message communication can be carried out between different node processes.

In the child process:

Receive and send messages through the mechanisms of process.on('message') and process.send().

In the parent process:

Receive and send through the mechanisms of child.on('message') and process.send() information.

Specific example, in child.js:

process.on('message',function(msg){
   process.send(msg)
})

Copy after login

In parent.js:

let cp=require('child_process');
let child=cp.fork('./child');
child.on('message',function(msg){
  console.log('got a message is',msg);
});
child.send('hello world');

Copy after login

Executing parent.js will output on the command line ：got a message is hello world

To interrupt the communication between father and son, you can disconnect the IPC communication between father and son by calling:

child.disconnect()

Copy after login

in the parent process.

5、同步执行的子进程

exec、execFile、spawn和fork执行的子进程都是默认异步的，子进程的运行不会阻塞主进程。除此之外，child_process模块同样也提供了execFileSync、spawnSync和execSync来实现同步的方式执行子进程。

三、node中的cluster模块

cluster意为集成，集成了两个方面，第一个方面就是集成了child_process.fork方法创建node子进程的方式，第二个方面就是集成了根据多核CPU创建子进程后，自动控制负载均衡的方式。

我们从官网的例子来看：

const cluster = require('cluster');
const http = require('http');
const numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  console.log(`主进程 ${process.pid} 正在运行`);

  // 衍生工作进程。
  for (let i = 0; i  {
    console.log(`工作进程 ${worker.process.pid} 已退出`);
  });
} else {
  // 工作进程可以共享任何 TCP 连接。
  // 在本例子中，共享的是一个 HTTP 服务器。
  http.createServer((req, res) => {
    res.writeHead(200);
    res.end('你好世界\n');
  }).listen(8000);

  console.log(`工作进程 ${process.pid} 已启动`);
}

Copy after login

最后输出的结果为：

$ node server.js
主进程 3596 正在运行
工作进程 4324 已启动
工作进程 4520 已启动
工作进程 6056 已启动
工作进程 5644 已启动

Copy after login

我们将master称为主进程，而worker进程称为工作进程，利用cluster模块，使用node封装好的API、IPC通道和调度机可以非常简单的创建包括一个master进程下HTTP代理服务器 + 多个worker进程多个HTTP应用服务器的架构。

总结

本文首先介绍了node的单线程和单进程模式，接着从单线程的缺陷触发，介绍了node中如何实现子进程的方法，对比了child_process模块中几种不同的子进程生成方案，最后简单介绍了内置的可以实现子进程以及CPU进程负载均衡的内置集成模块cluster。