How to troubleshoot Kubernetes nodes-Linux Operation and Maintenance-php.cn

Table of Contents

Check node status and basic information

Check network and firewall settings

Basically that's it

Home

Operation and Maintenance

Linux Operation and Maintenance

How to troubleshoot Kubernetes nodes

Karen Carpenter

Aug 02, 2025 am 02:44 AM

To troubleshoot Kubernetes node problems, follow the steps: 1. Use kubectl get nodes and describe node to view node status and detailed information, and pay attention to the exceptions in Conditions; 2. Log in to the node to check whether the kubelet status, logs and container runtime are normal; 3. Check network connections and firewall settings to ensure that the port communication with the API Server is normal; 4. Check the CNI plug-in status and related logs. Through the above methods, the cause of the problem can be basically located, and the problem can be effectively solved by checking it in order.

How to troubleshoot Kubernetes nodes

If there is a problem with the node, the most common manifestation is that the state is not Ready, the pod cannot be scheduled, or the service is abnormal. It is actually not difficult to troubleshoot the Kubernetes node problem. As long as you follow the steps step by step, you can basically find the cause.

Check node status and basic information

First, use kubectl get nodes to view the status of the node. If you see a node that is NotReady, it means there is a problem with this node.

You can add -o wide to see more information, such as IP, version, etc.:

 kubectl get nodes -o wide

Then, use kubectl describe node <node-name> to view the details. Focus on the Conditions section, which contains several key indicators, such as Ready , MemoryPressure , DiskPressure , PIDPressure and NetworkAvailable . If there is a certain item displayed as False , it is a potential problem point.

for example:

If Ready is False , it may be that the kubelet is not started or cannot communicate.
If DiskPressure is True , it means there is insufficient disk space.
If MemoryPressure is True , it means that the memory is not enough.

At this time, you can check the logs on the corresponding node based on these clues.

Log in to the node to view the kubelet status and service log

Most node problems are related to kubelet. You can log in to the corresponding node to execute:

 systemctl status kubelet
journalctl -u kubelet -n 100

Check whether the kubelet is running and whether there is any error. Common errors include:

Certificate expires or permissions are incorrect
Network plug-in is not up (for example, the CNI configuration is missing)
Docker or containerd is out of the way
Insufficient system resources lead to OOMKilled

If it is a certificate problem, you can try restarting the kubelet:

 sudo systemctl restart kubelet

If it still doesn't work after restarting, go to /var/log/kubelet.log or use journalctl to check for more detailed log records.

Also, check if the container is running normally:

 systemctl status docker # or containerd
docker info # or critic info

Sometimes the container is hung during running, which will cause the node to be unavailable.

Check network and firewall settings

If the node is not connected to the master, it may also be a problem with the network or firewall. By default, kubelet will communicate with API Server through ports 443, 6443, 10250. If your environment has firewall restrictions, make sure these ports are open.

You can use telnet or nc to connect:

 nc -zv <apiserver-ip> 6443

If you can't connect, you have to contact the network administrator to see if it is a security group or routing rule problem.

In addition, if CNI plug-ins (such as Calico and Flannel) are not configured properly, it may also cause the node to be registered but the pod cannot be started. You can further troubleshoot by viewing the logs of CNI-related components:

 kubectl get pods -n kube-system
kubectl logs <cni-pod-name> -n kube-system

Basically that's it

In general, the main way to troubleshoot node problems is to rely on kubectl describe node , check kubelet status, view system logs, and then judge based on network and resource conditions. Although it seems a bit too many, each step is very direct. The key is to take it step by step in order, just don’t jump.

The above is the detailed content of How to troubleshoot Kubernetes nodes. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress images for free

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

RimWorld Odyssey Temperature Guide for Ships and Gravtech

1 months ago By Jack chen

RimWorld Odyssey How to Fish

1 months ago By Jack chen

Can I have two Alipay accounts?

1 months ago By 下次还敢

Beginner's Guide to RimWorld: Odyssey

1 months ago By Jack chen

PHP Variable Scope Explained

3 weeks ago By 百草

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Laravel Tutorial

1603

PHP Tutorial

1506

276

Related knowledge

How to process command line arguments in bash Jul 13, 2025 am 12:02 AM

Bash scripts handle command line parameters through special variables. Use $1, $2, etc. to get positional parameters, where $0 represents the script name; iterates through "$@" or "$*", the former retains space separation, and the latter is merged into a single string; use getopts to parse options with parameters (such as -a, -b:value), where the option is added to indicate the parameter value; at the same time, pay attention to referring to variables, using shift to move the parameter list, and obtaining the total number of parameters through $#.

How to use the `shutdown` command Jul 15, 2025 am 12:26 AM

The shutdown command of Linux/macOS can be shut down, restarted, and timed operations through parameters. 1. Turn off the machine immediately and use sudoshutdownnow or -h/-P parameters; 2. Use the time or specific time point for the shutdown, cancel the use of -c; 3. Use the -r parameters to restart, support timed restart; 4. Pay attention to the need for sudo permissions, be cautious in remote operation, and avoid data loss.

How to add a new repository apt Jul 14, 2025 am 12:06 AM

There are three main ways to add a new APT repository: use the add-apt-repository command to be suitable for common PPAs or officially supported repositories; manually create .list files suitable for fine control or non-supporting the former repository; use .deb lines to write source files directly to specific statements provided by the official website. You also need to pay attention to updating the source list and adding GPG keys to ensure security.

How to configure NTP server Jul 15, 2025 am 12:17 AM

The steps to configure an NTP server include installing services, modifying configuration files, checking synchronization status, and setting up a firewall. 1. Install NTP service: Use sudoapt on Ubuntu/Debian, use sudoyum to install on CentOS/RHEL, start and enable the service after installation; 2. Modify the configuration file /etc/ntp.conf: Add trusted upstream servers such as serverntp.aliyun.comiburst and servertime.windows.comiburst, and set allow access to network segments such as restrict192.168.1.0mask255.255.255.0nomod

How to configure group disk quotas Jul 14, 2025 am 12:06 AM

To set a group disk quota, 1. Confirm the file system to support and enable the mount option; 2. Remount or restart to make the configuration take effect; 3. Initialize the quota database; 4. Use edquota or xfs_quota to set group restrictions; 5. Enable quota and check the usage regularly. Specific operations include editing /etc/fstab to add usrquota, grpquota parameters, using quotacheck or xfs_quota to generate database files, setting soft and hard limits through edquota-g, running quotaon to enable quota, and viewing reports through repquota or xfs_quota. It is recommended to configure timed tasks to monitor quota status.

How to check active network connections Jul 22, 2025 am 12:35 AM

If you want to know the network connection on your current computer, you can view it through the command line tool; use netstat-ano on Windows to view all connections and PIDs, use ss-tulnp and lsof-i-P to obtain detailed information, and can also be monitored in real time through graphical interface tools such as resource monitor, nethogs, etc.

How to create LVM volume group Jul 21, 2025 am 12:55 AM

To create an LVM volume group, you must first prepare a physical volume (PV) and then create a VG. 1. Use pvcreate to initialize the hard disk or partition into PV, such as pvcreate/dev/sdb1; 2. Use the vgcreate command to combine one or more PVs into VG, such as vgcreatemy_volume_group/dev/sdb1/dev/sdc1; 3. You can customize the PE size through the -s parameter and use vgdisplay to view information; 4. You can dynamically expand VG in the future and add a new PV using vgextend; 5. Before deleting VG, you must confirm that there is no LV and delete it with vgremove.

How to configure NFS server Jul 17, 2025 am 12:53 AM

The steps to configure an NFS server are as follows: 1. Install the nfs-utils or nfs-kernel-server package; 2. Start and enable nfs-server and related RPC services; 3. Edit /etc/exports to configure shared directories and permissions, such as rw, ro, sync, etc.; 4. Execute exportfs-a and open the firewall port; 5. The client uses the mount command to mount or configure fstab to achieve automatic mount; Common problems include permission control, ID mapping, RPC service not being started and configuration not being refreshed, and needs to be checked in conjunction with logs.

See all articles