Shell script implements Linux system and process resource monitoring

In the process of server operation and maintenance, it is often necessary to monitor various resources of the server, such as: CPU load monitoring, disk usage monitoring, process number monitoring, etc., so as to promptly alarm and notify when an abnormality occurs in the system. System administrator. This article introduces several common monitoring requirements and the writing of shell scripts under Linux systems.

Article directory:

1.Linux uses Shell to check whether the process exists
2.Linux uses Shell to detect process CPU utilization
3.Linux uses Shell to detect process memory usage
4.Linux uses Shell to detect process handle usage
5.Linux uses Shell to check whether a TCP or UDP port is listening
6.Linux uses Shell to check the number of running processes of a certain process name
7.Linux uses Shell to detect system CPU load
8.Linux uses Shell to detect system disk space
9. Summary

Check if the process exists

When monitoring a process, we generally need to get the ID of the process. The process ID is the unique identifier of the process. However, sometimes there may be multiple processes with the same process name running under different users on the server. The following function GetPID It gives the function of getting the process ID of the specified process name under the specified user (currently only considering starting a process with this process name under this user). It has two parameters: user name and process name. It first uses ps to find the process information, and at the same time Use grep to filter out the required process, and finally use sed and awk to find the ID value of the required process (this function can be modified according to the actual situation, such as if other information needs to be filtered, etc.).

List 1. Monitor the process

Copy code The code is as follows:

function GetPID #User #Name 

{ 

PsUser=$1 

PsName=$2 

pid=`ps -u $PsUser|grep $PsName|grep -v grep|grep -v vi|grep -v dbxn 

|grep -v tail|grep -v start|grep -v stop |sed -n 1p |awk '{print $1}'` 

echo $pid 

}

Sample demo:

1) Source program (for example, find the process ID where the user is root and the process name is CFTestApp)

Copy code The code is as follows:

PID=`GetPID root CFTestApp` 

echo $PID

2) Result output

Copy code The code is as follows:

11426 

[dyu@xilinuxbldsrv shell]$

3) Result analysis

As can be seen from the above output: 11426 is the process ID of the CFTestApp program under the root user.

4) Command introduction

1. ps: View instant process information in the system. Parameters: -u Lists the status of programs belonging to this user, which can also be specified using the user name. -p specifies the process identifier and lists the status of the process. -o specifies the output format 2. grep: used to find the current line in the file that matches the string. Parameters: -v reverse selection, that is, display the line without the 'search string' content. 3. sed: A non-interactive text editor that edits files or files exported from standard input and can only process one line of content at a time. Parameters: -n Read the next input line and use the next command to process the new line instead of the first command. p flag prints matching lines 4. awk: a programming language used for text and data processing under linux/unix. Data can come from standard input, one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool under linux/unix. It is used from the command line, but more often as a script. The way awk processes text and data: it scans the file line by line, from the first line to the last line, looking for lines that match a specific pattern, and performs the operation you want on those lines. If no processing action is specified, matching lines are displayed to the standard output (screen). If no mode is specified, all lines specified by the operation are processed. Parameters: -F fs or –field-separator fs: Specify the input file fold separator, fs is a string or a regular expression, such as -F:.
Sometimes it is possible that the process is not started. The following function is to check whether the process ID exists. If the process is not running, the output is:

Copy code The code is as follows:

The process does not exist. 

# Check if the process exists 

If [ "-$PID" == "-" ] 

Then 

{ 

echo "The process does not exist."

} 

fi

Detect process CPU utilization

When maintaining application services, we often encounter situations where the CPU is too high, causing business congestion and business interruption. Excessive CPU may be due to abnormal situations such as business overload or endless loops. The business process CPU is constantly monitored through scripts. Maintenance personnel can be notified in time when the CPU utilization is abnormal, which facilitates maintenance personnel to analyze, locate, and avoid business in a timely manner. Interruptions etc. The following function obtains the process CPU utilization for a specified process ID. It has a parameter for the process ID. It first uses ps to find the process information, while filtering out the %CPU line through grep -v, and finally uses awk to find the integer part of the CPU utilization percentage (if there are multiple CPUs in the system, the CPU utilization can be more than 100%).

List 2. Real-time monitoring of business process CPU

Copy code The code is as follows:

function GetCpu 

{ 

CpuValue=`ps -p $1 -o pcpu |grep -v CPU | awk '{print $1}' | awk - F. '{print $1}'` 

echo $CpuValue 

}

The following function is to obtain the CPU utilization of this process through the above function GetCpu, and then use conditional statements to determine whether the CPU utilization exceeds the limit. If it exceeds 80% (can be adjusted according to the actual situation), an alarm will be output, otherwise normal information will be output. .

List 3. Determine whether CPU utilization exceeds the limit

Copy code The code is as follows:

function CheckCpu 

{ 

PID=$1 

cpu=`GetCpu $PID` 

If [ $cpu -gt 80 ] 

Then 

{ 

echo “The usage of cpu is larger than 80%”

} 

else 

{ 

echo “The usage of cpu is normal”

} 

fi 

}

Sample demonstration:

1) Source program (assuming that the process ID of CFTestApp has been queried above as 11426)

Copy code The code is as follows:

CheckCpu 11426

2) Result output

Copy code The code is as follows:

The usage of cpu is 75 

The usage of cpu is normal 

[dyu@xilinuxbldsrv shell]$

3) Result analysis

As can be seen from the above output: the current CPU usage of the CFTestApp program is 75%, which is normal and does not exceed the 80% alarm limit.

Detect process memory usage

When maintaining application services, we often encounter situations where the process crashes due to excessive memory usage, causing business interruption (for example, the maximum addressable memory space of a 32-bit program is 4G. If it is exceeded, the memory application will fail. , and physical memory is also limited). Excessive memory usage may be due to memory leaks, message accumulation, etc. The memory usage of the business process is constantly monitored through scripts. Alarms can be sent in time (for example, through SMS) when the memory usage is abnormal, so that maintenance personnel can handle it in a timely manner. The following function obtains the process memory usage for a specified process ID. It has a parameter for the process ID, it first uses ps to find the process information, while filtering out the VSZ lines via grep -v, and then gets the memory usage in megabytes by dividing by 1000.

List 4. Monitor the memory usage of business processes

Copy code The code is as follows:

Function GetMem 

{ 

MEMUsage=`ps -o vsz -p $1|grep -v VSZ` 

(( MEMUsage /= 1000)) 

echo $MEMUsage 

}

The following function is to obtain the memory usage of this process through the above function GetMem, and then use conditional statements to determine whether the memory usage exceeds the limit. If it exceeds 1.6G (can be adjusted according to the actual situation), an alarm will be output, otherwise normal information will be output.

Listing 5. Determining whether memory usage exceeds the limit

Copy code The code is as follows:

mem=`GetMem $PID`                                                                              
if [ $mem -gt 1600 ] 

then 

{ 

echo “The usage of memory is larger than 1.6G”

} 

else 

{ 

echo “The usage of memory is normal”

} 

fi

Sample demo:

1) Source program (assuming that the process ID of CFTestApp has been queried above as 11426)

Copy code The code is as follows:

mem=`GetMem 11426` 

echo "The usage of memory is $mem M"

If [ $mem -gt 1600 ] 

Then 

{ 

echo "The usage of memory is larger than 1.6G"

} 

else 

{ 

echo "The usage of memory is normal"

} 

fi

2) Result output

Copy code The code is as follows:

The usage of memory is 248 M 

The usage of memory is normal 

[dyu@xilinuxbldsrv shell]$

3) Result analysis

As can be seen from the above output: the current memory usage of the CFTestApp program is 248M, which is normal and does not exceed the 1.6G alarm limit.

Detect process handle usage

When maintaining application services, we often encounter business interruptions due to excessive use of handles. The use of process handles on each platform is limited. For example, on the Linux platform, we can use the ulimit – n command (open files (-n) 1024) or view the contents of /etc/security/limits.conf to get Process handle limit. Excessive handle usage may be due to excessive load, handle leakage, etc. The handle usage of the business process is constantly monitored through scripts. Alerts can be sent in time when abnormalities occur (for example, through SMS), so that maintenance personnel can handle them in a timely manner. The following function obtains the process handle usage for a specified process ID. It has one parameter for the process ID. It first uses ls to output process handle information, and then uses wc -l to count the number of output handles.

Copy code The code is as follows:

Function GetDes 

{ 

DES=`ls /proc/$1/fd | wc -l` 

echo $DES 

}

The following function is to obtain the handle usage of this process through the above function GetDes, and then use conditional statements to determine whether the handle usage exceeds the limit. If it exceeds 900 (can be adjusted according to the actual situation), an alarm will be output, otherwise the output will be normal. information.

Copy code The code is as follows:

des=` GetDes $PID`
if [ $des -gt 900 ]
then
{
echo “The number of des is larger than 900”
}
else
{
echo “The number of des is normal”
}
fi

Sample demo:

1) Source program (assuming that the process ID of CFTestApp found in the above query is 11426)

Copy code The code is as follows:

des=`GetDes 11426` 

echo "The number of des is $des"

If [ $des -gt 900 ] 

Then 

{ 

echo "The number of des is larger than 900"

} 

else 

{ 

echo "The number of des is normal"

} 

fi

2) Result output

Copy code The code is as follows:

The number of des is 528 

The number of des is normal 

[dyu@xilinuxbldsrv shell]$

3) Result analysis

As can be seen from the above output: the current handle usage of the CFTestApp program is 528, which is normal, and does not exceed the 900 alarm limit.

4) Command introduction

wc: Count the number of bytes, words, and lines in the specified file, and display and output the statistical results. Parameters: -l counts the number of lines. -c counts bytes. -w Count word count.

Check whether a TCP or UDP port is listening

Port detection is often encountered in system resource detection, especially in network communication situations, the detection of port status is often very important. Sometimes the process, CPU, memory, etc. may be in a normal state, but the port is in an abnormal state and the business is not running normally. The following function can determine whether the specified port is listening. It has a parameter for the port to be detected. It first uses netstat to output the port occupancy information, and then filters out the number of listening TCP ports through grep, awk, and wc. The second statement outputs the number of monitoring UDP ports. If TCP and UDP port listening is all 0, return 0, otherwise return 1.

List 6. Port detection

Copy code The code is as follows:

function Listening 

{ 

TCPListeningnum=`netstat -an | grep ":$1 " | n

awk '$1 == "tcp" && $NF == "LISTEN" {print $0}' | wc -l` 

UDPListeningnum=`netstat -an|grep ":$1 " n

|awk '$1 == "udp" && $NF == "0.0.0.0:*" {print $0}' | wc -l` 

(( Listeningnum = TCPListeningnum UDPListeningnum )) 

If [ $Listeningnum == 0 ] 

Then 

{ 

echo "0"

} 

else 

{ 

echo "1"

} 

fi 

}

Sample demo:

1) Source program (for example, query the status of port 8080 to see if it is listening)

Copy code The code is as follows:

isListen=`Listening 8080` 

If [ $isListen -eq 1 ] 

Then 

{ 

echo "The port is listening"

} 

else 

{ 

echo "The port is not listening"

} 

fi

2) Result output

Copy code The code is as follows:

The port is listening 

[dyu@xilinuxbldsrv shell]$

3) Result analysis

As can be seen from the above output: port 8080 of this Linux server is in listening state.

4) Command introduction

netstat: Used to display statistical data related to IP, TCP, UDP and ICMP protocols. It is generally used to check the network connection of each port of the machine. Parameters: -a displays all sockets in the connection. -n Use the IP address directly without going through a domain name server.
The following function also detects whether a certain TCP or UDP port is in a normal state.

Copy code The code is as follows:

tcp: netstat -an|egrep $1 |awk '$6 == "LISTEN" && $1 == "tcp" {print $0}'

udp: netstat -an|egrep $1 |awk '$1 == "udp" && $5 == "0.0.0.0:*" {print $0}'

Command introduction

egrep: Find the specified string in the file. The execution effect of egrep is like grep -E. The syntax and parameters used can refer to the grep command. The difference from grep is the method of interpreting strings. egrep uses extended regular expression syntax to interpret, while grep uses basic regular expressions. Syntax, extended regular expressions have more complete expression specifications than basic regular expressions.

View the number of running processes of a certain process name

Sometimes we may need to get the number of started processes on the server. The following function is to detect the number of running processes. For example, the process name is CFTestApp.

Copy code The code is as follows:

Runnum=`ps -ef | grep -v vi | grep -v tail | grep "[ /]CFTestApp" | grep -v grep | wc -l

Detect system CPU load

When performing server maintenance, we sometimes encounter business interruptions due to excessive system CPU (utilization) load. Multiple processes may be running on the server. It is normal to view the CPU of a single process, but the CPU load of the entire system may be abnormal. The system CPU load is constantly monitored through scripts, and alarms can be sent in time when abnormalities occur, allowing maintenance personnel to handle them in a timely manner and prevent accidents. The following function can detect the system CPU usage. Use vmstat to get the idle value of the system CPU 5 times, take the average, and then get the actual CPU usage value by taking the difference from 100.

Copy code The code is as follows:

function GetSysCPU 

{ 

CpuIdle=`vmstat 1 5 |sed -n '3,$p' n

|awk '{x = x $15} END {print x/5}' |awk -F. '{print $1}'

CpuNum=`echo "100-$CpuIdle" | bc` 

echo $CpuNum 

}

Sample demo:

1) Source program

Copy code The code is as follows:

cpu=`GetSysCPU` 

echo "The system CPU is $cpu"

if [ $cpu -gt 90 ] 

then 

{ 

echo "The usage of system cpu is larger than 90%"

} 

else 

{ 

echo "The usage of system cpu is normal"

} 

fi

2) Result output

Copy code The code is as follows:

The system CPU is 87 

The usage of system cpu is normal 

[dyu@xilinuxbldsrv shell]$

3) Result analysis

As can be seen from the above output: the current CPU utilization of the Linux server system is 87%, which is normal and does not exceed the 90% alarm limit.

4) Command introduction

vmstat: The abbreviation of Virtual Meomory Statistics (virtual memory statistics), which can monitor the virtual memory, process, and CPU activities of the operating system.
Parameters: -n indicates that the output header information will only be displayed once during periodic loop output.

Check system disk space

System disk space detection is an important part of system resource detection. During system maintenance, we often need to check the server disk space usage. Because some businesses need to write call notes, logs, or temporary files from time to time, if the disk space is used up, it may also cause business interruption. The following function can detect the disk space usage of a directory in the current system disk space. Input parameters For the directory name that needs to be detected, use df to output the system disk space usage information, and then filter through grep and awk to obtain the disk space usage percentage of a certain directory.

Copy code The code is as follows:

function GetDiskSpc 

{ 

If [ $# -ne 1 ] 

Then 

Return 1 

fi 

Folder="$1$"

DiskSpace=`df -k |grep $Folder |awk '{print $5}' |awk -F% '{print $1}'

echo $DiskSpace 

}

Sample demo:

1) Source program (the detection directory is /boot)

Copy code The code is as follows:

Folder="/boot"

DiskSpace=`GetDiskSpc $Folder` 

echo "The system $Folder disk space is $DiskSpace%"

if [ $DiskSpace -gt 90 ] 

then 

{ 

echo "The usage of system disk($Folder) is larger than 90%"

} 

else 

{ 

echo "The usage of system disk($Folder) is normal"

} 

fi

2) Result output

Copy code The code is as follows:

The system /boot disk space is 14% 

The usage of system disk(/boot) is normal 

[dyu@xilinuxbldsrv shell]$

3) Result analysis

From the above output, we can see that 14% of the disk space in the /boot directory on this Linux server system has been used, which is normal and does not exceed the 90% usage alarm limit.

4) Command introduction

df: Check the disk space usage of the file system. You can use this command to obtain information such as how much space is occupied on the hard disk and how much space is currently left. Parameters: -k Display in k bytes.

Summary

On the Linux platform, shell script monitoring is a very simple, convenient and effective method to monitor servers and processes. It is very helpful for system developers and process maintainers. It can not only monitor the above information and send alarms, but also monitor process logs and other information. I hope this article will be helpful to everyone.