What should I do if the node service CPU is too high? Let's talk about troubleshooting ideas-JS Tutorial-php.cn

What should I do if the node service CPU is too high? Let's talk about troubleshooting ideas

青灯夜游

Release： 2022-09-15 19:46:25

forward

3622 people have browsed it

nodeWhat should I do if the service CPU is too high? How to check? The following article will sort out and share with you the troubleshooting ideas for node service CPU being too high. I hope it will be helpful to you!

What should I do if the node service CPU is too high? Let's talk about troubleshooting ideas

Help a colleague look at a problem of excessive CPU

The CPU cannot go down after it has increased. Finally, the colleague found out that it was a dependency upgrade. The default public redis configuration was offline after the major version (the project is old and has not been touched for a long time), but the business side needs to configure and shut down the redis service in its own code. The business side has an information gap, so they don't know to close redis, which causes them to keep retrying to connect to redis after going online (one more request means one more retry)

Finally, we summarized the troubleshooting ideas, as follows , welcome to add

Troubleshooting ideas

0. Restart the instance

Some problems can be solved by restarting the instance.

Restart the instance first. This is a necessary step to make the service available first. If the subsequent CPU still surges too fast, you may have to consider rolling back the code first. If the surge is not fast, you don’t need to roll back and troubleshoot the problem as soon as possible

1. linux shell Determine whether it is caused by the node process

Command 1: top

It can be found that the node process is mainly occupying the CPU. [Related tutorial recommendations: nodejs video tutorial]

[root@*** ~]# top

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                     
680 root      20   0 2290976 168176  34976 S  30.3  2.0 103:42.59 node                                                                                                                        
687 root      20   0 2290544 166920  34984 R  26.3  2.0  96:26.42 node                                                                                                                        
 52 root      20   0 1057412  23972  15188 S   1.7  0.3  11:25.97 ****                                                                                                           
185 root      20   0  130216  41432  25436 S   0.3  0.5   1:03.44 ****                                                                                                         
...

Copy after login

Command 2: vmstat

First look at a vmstat 2 command , indicating that it is collected every two seconds

[root@*** ~]# vmstat 2
procs -----------memory---------------- ---swap-- -----io---- --system-- -----cpu-----
 r  b      swpd  free   buff   cache      si   so    bi    bo   in cs   us sy id wa st
 0  0      0 233481328 758304 20795516    0    0     0     1    0    0  0  0 100  0  0
 0  0      0 233480800 758304 20795520    0    0     0     0  951 1519  0  0 100  0  0
 0  0      0 233481056 758304 20795520    0    0     0     0  867 1460  0  0 100  0  0
 0  0      0 233481408 758304 20795520    0    0     0    20  910 1520  0  0 100  0  0
 0  0      0 233481680 758304 20795520    0    0     0     0  911 1491  0  0 100  0  0
 0  0      0 233481920 758304 20795520    0    0     0     0  889 1530  0  0 100  0  0

Copy after login

procs
r #Represents the running queue (that is, how many processes are actually allocated to the CPU), When this value exceeds the number of CPUs, a CPU bottleneck will occur. This is also related to the load of top. Generally, if the load exceeds 3, it is relatively high, if it exceeds 5, it is high, if it exceeds 10, it is abnormal, and the status of the server is very dangerous. The load of top is similar to the run queue per second. If the run queue is too large, it means that your CPU is very busy, which generally results in high CPU usage.
b #Indicates a blocked process, a process waiting for resources. I won’t say much about this, but everyone knows that the process is blocked.
memory
swpd #The size of virtual memory used. If it is greater than 0, it means that your machine's physical memory is insufficient. If it is not the cause of program memory leak, then You should upgrade the memory or migrate memory-consuming tasks to other machines.
free # The size of free physical memory
buff #Linux/Unix system is used to store the contents, permissions, etc. of the directory
cache #cache It is directly used to remember the files we open, buffer the files, and use part of the free physical memory to cache files and directories in order to improve the performance of program execution. When the program uses memory, buffer/cached will be very fast. land is used.
swap
si #The size of the virtual memory read from the disk per second. If this value is greater than 0, it means that the physical memory is not enough or the memory is leaked. You need to find it. Solve the memory-consuming process. My machine has plenty of memory and everything works fine.
so #The size of virtual memory written to disk per second, if this value is greater than 0, same as above.
io
bi #The number of blocks received by the block device per second. The block device here refers to all disks and other block devices on the system. The default block size is 1024byte
bo #The number of blocks sent by the block device per second. For example, when we read a file, bo must be greater than 0. Bi and bo are generally close to 0, otherwise the IO is too frequent and needs to be adjusted.
system
in #The number of CPU interrupts per second, including time interrupts
cs #The number of context switches per second, for example, when we call system functions , it is necessary to perform context switching, thread switching, and process context switching. The smaller the value, the better. If it is too large, consider lowering the number of threads or processes
cpu
us #User CPU time. I was on a server that frequently encrypted and decrypted. I could see that us was close to 100 and the r run queue reached 80 (the machine was doing stress testing and its performance was poor) .
sy #System CPU time, if it is too high, it means that the system call time is long, such as frequent IO operations.
id #Idle CPU time, generally speaking, id us sy = 100, generally I think id is the idle CPU usage, us is the user CPU usage, and sy is the system CPU usage.
wt #Waiting for IO CPU time.
practice
procs r: There are many processes running and the system is very busy.
bi/bo: The amount of data written to the disk is slightly larger. If it is a large file, it should be within 10M. There is basically no need to worry. If it is a small file, it should be within 2M. Basically normal
cpu us: It is continuously greater than 50%, which is acceptable during service peak periods. If it is greater than 50 for a long time, you can consider optimization
cpu sy: The percentage of actual kernel processes, the reference value of us sy here is 80% , if us sy is greater than 80%, it means there may be insufficient CPU.
cpu wa: column shows the percentage of CPU time occupied by IO waiting. The reference value of wa here is 30%. If wa exceeds 30%, it means that the IO wait is serious. This may be caused by a large number of random accesses to the disk, or it may be caused by the bandwidth bottleneck of the disk or disk access controller (mainly block operations)

Reference link: https://www.cnblogs.com/zsql/p/11643750.html

2. Look at the code diff

If restarting the instance still does not solve the problem, and it is determined that the problem is the node process,

Check the online commit, check the code diff, and see if the problem can be found. Click

3. Open the runtime CPU profiler

This operation method is the same as my other articleHow to quickly locate SSR server memory leaks Question is similar to

Use node --inspect to start the service
Local simulation of the online environment, use build After the code, direct build may not be usable. Environment variables must be controlled well, and ugly compression must be turned off.
- For example, let some environment variables (CDN domain name, etc.) point to Local, because the package is local and not uploaded to CDN
Generate CPU profiler

What should I do if the node service CPU is too high? Lets talk about troubleshooting ideas

What if the online environment cannot be simulated locally?

For example, if the downstream RPC is isolated from the local, then you can only add code to create a profilenodejs.org/docs/latest…

What should I do if the node service CPU is too high? Lets talk about troubleshooting ideas

After getting the profile file, open it with chrome devtool

What should I do if the node service CPU is too high? Lets talk about troubleshooting ideas

4. Analyze the CPU profiler

Combine profiler and code diff to find the cause
You can also upload the profile file to www.speedscope.app/ (File upload), you can get the cpu profile flame graph (more detailed introduction: www.npmjs.com/package/spe…

What should I do if the node service CPU is too high? Lets talk about troubleshooting ideas

5. Stress test verification

You can use ab, or other stress test tools

Summary

Restart the instance
Make sure it is caused by the node process
Look at the code diff
Generate runtime CPU profiler
Combined profiler and code diff to find the cause
Stress test verification

For more node-related knowledge, please visit: nodejs tutorial!

The above is the detailed content of What should I do if the node service CPU is too high? Let's talk about troubleshooting ideas. For more information, please follow other related articles on the PHP Chinese website!