In the process of server operation and maintenance, it is often necessary to monitor various resources of the server, such as: CPU load monitoring, disk usage monitoring, process number monitoring, etc., so as to promptly alarm and notify when an abnormality occurs in the system. System administrator. This article introduces several common monitoring requirements and the writing of shell scripts under Linux systems.
Article directory:
1.Linux uses Shell to check whether the process exists
2.Linux uses Shell to detect process CPU utilization
3.Linux uses Shell to detect process memory usage
4.Linux uses Shell to detect process handle usage
5.Linux uses Shell to check whether a TCP or UDP port is listening
6.Linux uses Shell to check the number of running processes of a certain process name
7.Linux uses Shell to detect system CPU load
8.Linux uses Shell to detect system disk space
9. Summary
Check if the process exists
When monitoring a process, we generally need to get the ID of the process. The process ID is the unique identifier of the process. However, sometimes there may be multiple processes with the same process name running under different users on the server. The following function GetPID It gives the function of getting the process ID of the specified process name under the specified user (currently only considering starting a process with this process name under this user). It has two parameters: user name and process name. It first uses ps to find the process information, and at the same time Use grep to filter out the required process, and finally use sed and awk to find the ID value of the required process (this function can be modified according to the actual situation, such as if other information needs to be filtered, etc.).
List 1. Monitor the process
Sample demo:
1) Source program (for example, find the process ID where the user is root and the process name is CFTestApp)
3) Result analysis
As can be seen from the above output: 11426 is the process ID of the CFTestApp program under the root user.
4) Command introduction
1. ps: View instant process information in the system. Parameters: -u
Sometimes it is possible that the process is not started. The following function is to check whether the process ID exists. If the process is not running, the output is:
Detect process CPU utilization
When maintaining application services, we often encounter situations where the CPU is too high, causing business congestion and business interruption. Excessive CPU may be due to abnormal situations such as business overload or endless loops. The business process CPU is constantly monitored through scripts. Maintenance personnel can be notified in time when the CPU utilization is abnormal, which facilitates maintenance personnel to analyze, locate, and avoid business in a timely manner. Interruptions etc. The following function obtains the process CPU utilization for a specified process ID. It has a parameter for the process ID. It first uses ps to find the process information, while filtering out the %CPU line through grep -v, and finally uses awk to find the integer part of the CPU utilization percentage (if there are multiple CPUs in the system, the CPU utilization can be more than 100%).
List 2. Real-time monitoring of business process CPU
List 3. Determine whether CPU utilization exceeds the limit
Sample demonstration:
1) Source program (assuming that the process ID of CFTestApp has been queried above as 11426)
As can be seen from the above output: the current CPU usage of the CFTestApp program is 75%, which is normal and does not exceed the 80% alarm limit.
Detect process memory usage
When maintaining application services, we often encounter situations where the process crashes due to excessive memory usage, causing business interruption (for example, the maximum addressable memory space of a 32-bit program is 4G. If it is exceeded, the memory application will fail. , and physical memory is also limited). Excessive memory usage may be due to memory leaks, message accumulation, etc. The memory usage of the business process is constantly monitored through scripts. Alarms can be sent in time (for example, through SMS) when the memory usage is abnormal, so that maintenance personnel can handle it in a timely manner. The following function obtains the process memory usage for a specified process ID. It has a parameter for the process ID, it first uses ps to find the process information, while filtering out the VSZ lines via grep -v, and then gets the memory usage in megabytes by dividing by 1000.
List 4. Monitor the memory usage of business processes
Listing 5. Determining whether memory usage exceeds the limit
1) Source program (assuming that the process ID of CFTestApp has been queried above as 11426)
As can be seen from the above output: the current memory usage of the CFTestApp program is 248M, which is normal and does not exceed the 1.6G alarm limit.
Detect process handle usage
When maintaining application services, we often encounter business interruptions due to excessive use of handles. The use of process handles on each platform is limited. For example, on the Linux platform, we can use the ulimit – n command (open files (-n) 1024) or view the contents of /etc/security/limits.conf to get Process handle limit. Excessive handle usage may be due to excessive load, handle leakage, etc. The handle usage of the business process is constantly monitored through scripts. Alerts can be sent in time when abnormalities occur (for example, through SMS), so that maintenance personnel can handle them in a timely manner. The following function obtains the process handle usage for a specified process ID. It has one parameter for the process ID. It first uses ls to output process handle information, and then uses wc -l to count the number of output handles.
Sample demo:
1) Source program (assuming that the process ID of CFTestApp found in the above query is 11426)
2) Result output
As can be seen from the above output: the current handle usage of the CFTestApp program is 528, which is normal, and does not exceed the 900 alarm limit.
4) Command introduction
wc: Count the number of bytes, words, and lines in the specified file, and display and output the statistical results. Parameters: -l counts the number of lines. -c counts bytes. -w Count word count.
Check whether a TCP or UDP port is listening
Port detection is often encountered in system resource detection, especially in network communication situations, the detection of port status is often very important. Sometimes the process, CPU, memory, etc. may be in a normal state, but the port is in an abnormal state and the business is not running normally. The following function can determine whether the specified port is listening. It has a parameter for the port to be detected. It first uses netstat to output the port occupancy information, and then filters out the number of listening TCP ports through grep, awk, and wc. The second statement outputs the number of monitoring UDP ports. If TCP and UDP port listening is all 0, return 0, otherwise return 1.
List 6. Port detection
1) Source program (for example, query the status of port 8080 to see if it is listening)
As can be seen from the above output: port 8080 of this Linux server is in listening state.
4) Command introduction
netstat: Used to display statistical data related to IP, TCP, UDP and ICMP protocols. It is generally used to check the network connection of each port of the machine. Parameters: -a displays all sockets in the connection. -n Use the IP address directly without going through a domain name server.
The following function also detects whether a certain TCP or UDP port is in a normal state.
egrep: Find the specified string in the file. The execution effect of egrep is like grep -E. The syntax and parameters used can refer to the grep command. The difference from grep is the method of interpreting strings. egrep uses extended regular expression syntax to interpret, while grep uses basic regular expressions. Syntax, extended regular expressions have more complete expression specifications than basic regular expressions.
View the number of running processes of a certain process name
Sometimes we may need to get the number of started processes on the server. The following function is to detect the number of running processes. For example, the process name is CFTestApp.
Detect system CPU load
When performing server maintenance, we sometimes encounter business interruptions due to excessive system CPU (utilization) load. Multiple processes may be running on the server. It is normal to view the CPU of a single process, but the CPU load of the entire system may be abnormal. The system CPU load is constantly monitored through scripts, and alarms can be sent in time when abnormalities occur, allowing maintenance personnel to handle them in a timely manner and prevent accidents. The following function can detect the system CPU usage. Use vmstat to get the idle value of the system CPU 5 times, take the average, and then get the actual CPU usage value by taking the difference from 100.
Sample demo:
1) Source program
2) Result output
3) Result analysis
As can be seen from the above output: the current CPU utilization of the Linux server system is 87%, which is normal and does not exceed the 90% alarm limit.
4) Command introduction
vmstat: The abbreviation of Virtual Meomory Statistics (virtual memory statistics), which can monitor the virtual memory, process, and CPU activities of the operating system.
Parameters: -n indicates that the output header information will only be displayed once during periodic loop output.
Check system disk space
System disk space detection is an important part of system resource detection. During system maintenance, we often need to check the server disk space usage. Because some businesses need to write call notes, logs, or temporary files from time to time, if the disk space is used up, it may also cause business interruption. The following function can detect the disk space usage of a directory in the current system disk space. Input parameters For the directory name that needs to be detected, use df to output the system disk space usage information, and then filter through grep and awk to obtain the disk space usage percentage of a certain directory.
Sample demo:
1) Source program (the detection directory is /boot)
2) Result output
From the above output, we can see that 14% of the disk space in the /boot directory on this Linux server system has been used, which is normal and does not exceed the 90% usage alarm limit.
4) Command introduction
df: Check the disk space usage of the file system. You can use this command to obtain information such as how much space is occupied on the hard disk and how much space is currently left. Parameters: -k Display in k bytes.
Summary
On the Linux platform, shell script monitoring is a very simple, convenient and effective method to monitor servers and processes. It is very helpful for system developers and process maintainers. It can not only monitor the above information and send alarms, but also monitor process logs and other information. I hope this article will be helpful to everyone.