How to troubleshoot Kubernetes nodes
To troubleshoot Kubernetes node problems, follow the steps: 1. Use kubectl get nodes and describe node to view node status and detailed information, and pay attention to the exceptions in Conditions; 2. Log in to the node to check whether the kubelet status, logs and container runtime are normal; 3. Check network connections and firewall settings to ensure that the port communication with the API Server is normal; 4. Check the CNI plug-in status and related logs. Through the above methods, the cause of the problem can be basically located, and the problem can be effectively solved by checking it in order.
If there is a problem with the node, the most common manifestation is that the state is not Ready, the pod cannot be scheduled, or the service is abnormal. It is actually not difficult to troubleshoot the Kubernetes node problem. As long as you follow the steps step by step, you can basically find the cause.
Check node status and basic information
First, use kubectl get nodes
to view the status of the node. If you see a node that is NotReady, it means there is a problem with this node.
You can add -o wide
to see more information, such as IP, version, etc.:
kubectl get nodes -o wide
Then, use kubectl describe node <node-name>
to view the details. Focus on the Conditions section, which contains several key indicators, such as Ready
, MemoryPressure
, DiskPressure
, PIDPressure
and NetworkAvailable
. If there is a certain item displayed as False
, it is a potential problem point.
for example:
- If
Ready
isFalse
, it may be that the kubelet is not started or cannot communicate. - If
DiskPressure
isTrue
, it means there is insufficient disk space. - If
MemoryPressure
isTrue
, it means that the memory is not enough.
At this time, you can check the logs on the corresponding node based on these clues.
Log in to the node to view the kubelet status and service log
Most node problems are related to kubelet. You can log in to the corresponding node to execute:
systemctl status kubelet journalctl -u kubelet -n 100
Check whether the kubelet is running and whether there is any error. Common errors include:
- Certificate expires or permissions are incorrect
- Network plug-in is not up (for example, the CNI configuration is missing)
- Docker or containerd is out of the way
- Insufficient system resources lead to OOMKilled
If it is a certificate problem, you can try restarting the kubelet:
sudo systemctl restart kubelet
If it still doesn't work after restarting, go to /var/log/kubelet.log
or use journalctl
to check for more detailed log records.
Also, check if the container is running normally:
systemctl status docker # or containerd docker info # or critic info
Sometimes the container is hung during running, which will cause the node to be unavailable.
Check network and firewall settings
If the node is not connected to the master, it may also be a problem with the network or firewall. By default, kubelet will communicate with API Server through ports 443, 6443, 10250. If your environment has firewall restrictions, make sure these ports are open.
You can use telnet
or nc
to connect:
nc -zv <apiserver-ip> 6443
If you can't connect, you have to contact the network administrator to see if it is a security group or routing rule problem.
In addition, if CNI plug-ins (such as Calico and Flannel) are not configured properly, it may also cause the node to be registered but the pod cannot be started. You can further troubleshoot by viewing the logs of CNI-related components:
kubectl get pods -n kube-system kubectl logs <cni-pod-name> -n kube-system
Basically that's it
In general, the main way to troubleshoot node problems is to rely on kubectl describe node
, check kubelet status, view system logs, and then judge based on network and resource conditions. Although it seems a bit too many, each step is very direct. The key is to take it step by step in order, just don’t jump.
The above is the detailed content of How to troubleshoot Kubernetes nodes. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Bash scripts handle command line parameters through special variables. Use $1, $2, etc. to get positional parameters, where $0 represents the script name; iterates through "$@" or "$*", the former retains space separation, and the latter is merged into a single string; use getopts to parse options with parameters (such as -a, -b:value), where the option is added to indicate the parameter value; at the same time, pay attention to referring to variables, using shift to move the parameter list, and obtaining the total number of parameters through $#.

The shutdown command of Linux/macOS can be shut down, restarted, and timed operations through parameters. 1. Turn off the machine immediately and use sudoshutdownnow or -h/-P parameters; 2. Use the time or specific time point for the shutdown, cancel the use of -c; 3. Use the -r parameters to restart, support timed restart; 4. Pay attention to the need for sudo permissions, be cautious in remote operation, and avoid data loss.

There are three main ways to add a new APT repository: use the add-apt-repository command to be suitable for common PPAs or officially supported repositories; manually create .list files suitable for fine control or non-supporting the former repository; use .deb lines to write source files directly to specific statements provided by the official website. You also need to pay attention to updating the source list and adding GPG keys to ensure security.

The steps to configure an NTP server include installing services, modifying configuration files, checking synchronization status, and setting up a firewall. 1. Install NTP service: Use sudoapt on Ubuntu/Debian, use sudoyum to install on CentOS/RHEL, start and enable the service after installation; 2. Modify the configuration file /etc/ntp.conf: Add trusted upstream servers such as serverntp.aliyun.comiburst and servertime.windows.comiburst, and set allow access to network segments such as restrict192.168.1.0mask255.255.255.0nomod

To set a group disk quota, 1. Confirm the file system to support and enable the mount option; 2. Remount or restart to make the configuration take effect; 3. Initialize the quota database; 4. Use edquota or xfs_quota to set group restrictions; 5. Enable quota and check the usage regularly. Specific operations include editing /etc/fstab to add usrquota, grpquota parameters, using quotacheck or xfs_quota to generate database files, setting soft and hard limits through edquota-g, running quotaon to enable quota, and viewing reports through repquota or xfs_quota. It is recommended to configure timed tasks to monitor quota status.

If you want to know the network connection on your current computer, you can view it through the command line tool; use netstat-ano on Windows to view all connections and PIDs, use ss-tulnp and lsof-i-P to obtain detailed information, and can also be monitored in real time through graphical interface tools such as resource monitor, nethogs, etc.

To create an LVM volume group, you must first prepare a physical volume (PV) and then create a VG. 1. Use pvcreate to initialize the hard disk or partition into PV, such as pvcreate/dev/sdb1; 2. Use the vgcreate command to combine one or more PVs into VG, such as vgcreatemy_volume_group/dev/sdb1/dev/sdc1; 3. You can customize the PE size through the -s parameter and use vgdisplay to view information; 4. You can dynamically expand VG in the future and add a new PV using vgextend; 5. Before deleting VG, you must confirm that there is no LV and delete it with vgremove.

The steps to configure an NFS server are as follows: 1. Install the nfs-utils or nfs-kernel-server package; 2. Start and enable nfs-server and related RPC services; 3. Edit /etc/exports to configure shared directories and permissions, such as rw, ro, sync, etc.; 4. Execute exportfs-a and open the firewall port; 5. The client uses the mount command to mount or configure fstab to achieve automatic mount; Common problems include permission control, ID mapping, RPC service not being started and configuration not being refreshed, and needs to be checked in conjunction with logs.
