❌Pod 노드 시작에 실패하여 nginx 서비스에 정상적으로 접근할 수 없어 서비스 상태가 ImagePullBackOff
로 표시되었습니다. ImagePullBackOff
。
[root@m1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE nginx-f89759699-cgjgp 0/1 ImagePullBackOff 0 103m
查看nginx服务的Pod节点详细信息。
[root@m1 ~]# kubectl describe pod nginx-f89759699-cgjgp Name: nginx-f89759699-cgjgp Namespace: default Priority: 0 Service Account: default Node: n1/192.168.200.84 Start Time: Fri, 10 Mar 2023 08:40:33 +0800 Labels: app=nginx pod-template-hash=f89759699 Annotations: <none> Status: Pending IP: 10.244.3.20 IPs: IP: 10.244.3.20 Controlled By: ReplicaSet/nginx-f89759699 Containers: nginx: Container ID: Image: nginx Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ImagePullBackOff Ready: False Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-zk8sj (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-zk8sj: Type: Secret (a volume populated by a Secret) SecretName: default-token-zk8sj Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal BackOff 57m (x179 over 100m) kubelet Back-off pulling image "nginx" Normal Pulling 7m33s (x22 over 100m) kubelet Pulling image "nginx" Warning Failed 2m30s (x417 over 100m) kubelet Error: ImagePullBackOff
发现,获取nginx
镜像失败。可能是由于Docker服务引起的。
于是,检查Docker是否正常启动
systemctl status docker
发现,docker服务启动失败????,手动尝试重新启动。
systemctl restart docker
但是,重启docker服务失败,出现如下报错信息。
[root@m1 ~]# systemctl restart docker Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
执行systemctl restart docker
命令失效。
接着,当执行docker version
命令时,发现未能连接到Docker daemon
[root@m1 ~]# docker version Client: Docker Engine - Community Version: 20.10.17 API version: 1.41 Go version: go1.17.11 Git commit: 100c701 Built: Mon Jun 6 23:03:11 2022 OS/Arch: linux/amd64 Context: default Experimental: true Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
于是,再次通过执行systemctl status docker
命令,查看docker服务未能启动,阅读输出报错信息,如下所示。
[root@m1 ~]# systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Fri 2023-03-10 10:28:16 CST; 4min 35s ago Docs: https://docs.docker.com Main PID: 2221 (code=exited, status=1/FAILURE) Mar 10 10:28:13 m1 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE Mar 10 10:28:13 m1 systemd[1]: docker.service: Failed with result 'exit-code'. Mar 10 10:28:13 m1 systemd[1]: Failed to start Docker Application Container Engine. Mar 10 10:28:16 m1 systemd[1]: docker.service: Service RestartSec=2s expired, scheduling restart. Mar 10 10:28:16 m1 systemd[1]: docker.service: Scheduled restart job, restart counter is at 3. Mar 10 10:28:16 m1 systemd[1]: Stopped Docker Application Container Engine. Mar 10 10:28:16 m1 systemd[1]: docker.service: Start request repeated too quickly. Mar 10 10:28:16 m1 systemd[1]: docker.service: Failed with result 'exit-code'. Mar 10 10:28:16 m1 systemd[1]: Failed to start Docker Application Container Engine. [root@m1 ~]#
通过上述输出显示,Docker 服务进程的启动失败,状态为 1/FAILURE
。
✅接下来,尝试通过以下步骤来排查和解决问题:
1️⃣查看 Docker 服务日志:使用以下命令查看 Docker 服务日志,以便更详细地了解失败原因。
sudo journalctl -u docker.service
2️⃣ 通过输出Ddocker日志分析,提取到了相关报错信息片段,发现是配置daemon中的/etc/docker/daemon.json
配置文件出错导致的。
Mar 10 10:20:17 m1 systemd[1]: Starting Docker Application Container Engine... Mar 10 10:20:17 m1 dockerd[1572]: unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character '"' after object key:value pair Mar 10 10:20:17 m1 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE Mar 10 10:20:17 m1 systemd[1]: docker.service: Failed with result 'exit-code'. Mar 10 10:20:17 m1 systemd[1]: Failed to start Docker Application Container Engine. Mar 10 10:20:19 m1 systemd[1]: docker.service: Service RestartSec=2s expired, scheduling restart. Mar 10 10:20:19 m1 systemd[1]: docker.service: Scheduled restart job, restart counter is at 2. Mar 10 10:20:19 m1 systemd[1]: Stopped Docker Application Container Engine.
3️⃣此时,查看daemon配置文件/etc/docker/daemon.json是否配置正确。
[root@m1 ~]# cat /etc/docker/daemon.json { # 设置 Docker 镜像的注册表镜像源为阿里云镜像源。 "registry-mirrors": ["https://w2kavmmf.mirror.aliyuncs.com"] # 指定 Docker 守护进程使用 systemd 作为 cgroup driver。 "exec-opts": ["native.cgroupdriver=systemd"] }
咋一看,配置信息没有什么问题,都是正确的,但仔细一看,就会发现应该在"registry-mirrors"
选项的结尾添加逗号。犯了缺少逗号(,
[root@m1 ~]# cat /etc/docker/daemon.json { "registry-mirrors": ["https://w2kavmmf.mirror.aliyuncs.com"], "exec-opts": ["native.cgroupdriver=systemd"] } [root@m1 ~]# cat /etc/docker/daemon.json { "registry-mirrors": ["https://w2kavmmf.mirror.aliyuncs.com"], "exec-opts": ["native.cgroupdriver=systemd"] }
systemctl daemon-reload systemctl restart docker systemctl status docker
nginx
이미지를 가져오는 데 실패했음을 발견했습니다. Docker 서비스로 인해 발생할 수 있습니다. 그래서 Docker가 정상적으로 시작되었는지 확인해보니 [root@m1 ~]# docket version -bash: docket: command not found [root@m1 ~]# docker version Client: Docker Engine - Community Version: 20.10.17 API version: 1.41 Go version: go1.17.11 Git commit: 100c701 Built: Mon Jun 6 23:03:11 2022 OS/Arch: linux/amd64 Context: default Experimental: true Server: Docker Engine - Community Engine: Version: 20.10.17 API version: 1.41 (minimum version 1.12) Go version: go1.17.11 Git commit: a89b842 Built: Mon Jun 6 23:01:29 2022 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.6 GitCommit: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1 runc: Version: 1.1.2 GitCommit: v1.1.2-0-ga916309 docker-init: Version: 0.19.0 GitCommit: de40ad0
[root@m1 ~]# docker info Client: Context: default Debug Mode: false Plugins: app: Docker App (Docker Inc., v0.9.1-beta3) buildx: Docker Buildx (Docker Inc., v0.8.2-docker) scan: Docker Scan (Docker Inc., v0.17.0) Server: Containers: 20 Running: 8 Paused: 0 Stopped: 12 Images: 20 Server Version: 20.10.17 Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1 runc version: v1.1.2-0-ga916309 init version: de40ad0 Security Options: seccomp Profile: default Kernel Version: 4.18.0-372.9.1.el8.x86_64 Operating System: Rocky Linux 8.6 (Green Obsidian) OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 9.711GiB Name: m1 ID: 4YIS:FHSB:YXRI:CED5:PJSJ:EAS2:BCR3:GJJF:FDPK:EDJH:DVKU:AIYJ Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Registry Mirrors: https://w2kavmmf.mirror.aliyuncs.com/ Live Restore Enabled: false
[root@m1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE nginx-f89759699-cgjgp 1/1 Running 0 174m
systemctl restart docker
명령 실행이 실패합니다. 그런 다음 docker version
명령을 실행했을 때 Docker 데몬에 연결하지 못한 것으로 확인되었습니다[root@m1 ~]# kubectl describe pod nginx-f89759699-cgjgp Name: nginx-f89759699-cgjgp Namespace: default Priority: 0 Service Account: default Node: n1/192.168.200.84 Start Time: Fri, 10 Mar 2023 08:40:33 +0800 Labels: app=nginx pod-template-hash=f89759699 Annotations: <none> Status: Running IP: 10.244.3.20 IPs: IP: 10.244.3.20 Controlled By: ReplicaSet/nginx-f89759699 Containers: nginx: Container ID: docker://88bdc2bfa592f60bf99bac2125b0adae005118ae8f2f271225245f20b7cfb3c8 Image: nginx Image ID: docker-pullable://nginx@sha256:aa0afebbb3cfa473099a62c4b32e9b3fb73ed23f2a75a65ce1d4b4f55a5c2ef2 Port: <none> Host Port: <none> State: Running Started: Fri, 10 Mar 2023 10:37:42 +0800 Ready: True Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-zk8sj (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: default-token-zk8sj: Type: Secret (a volume populated by a Secret) SecretName: default-token-zk8sj Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal BackOff 58m (x480 over 171m) kubelet Back-off pulling image "nginx" [root@m1 ~]#
systemctl status docker
명령을 실행하여 docker 서비스가 실패했는지 확인하십시오. 아래와 같이 시작하고 출력 오류 메시지를 읽으십시오. rrreee🎜위 출력은 Docker 서비스 프로세스 시작에 실패했으며 상태가 1/FAILURE
임을 보여줍니다. 🎜🎜✅다음으로 다음 단계에 따라 문제를 해결해 보세요. 🎜🎜1️⃣Docker 서비스 로그 보기: 실패 원인을 더 자세히 이해하려면 다음 명령을 사용하여 Docker 서비스 로그를 확인하세요. 🎜rrreee🎜🎜🎜2️⃣ 통과 출력된 Ddocker 로그를 분석한 결과 관련 오류 정보 조각을 추출한 결과 데몬 구성의 /etc/docker/daemon.json
구성 파일에 오류가 있어서 발생한 것으로 확인되었습니다. 🎜rrreee🎜3️⃣ 이때 데몬 구성 파일 /etc/docker/daemon.json이 올바르게 구성되었는지 확인합니다. 🎜rrreee🎜얼핏 보면 구성정보에 이상이 없고 다 맞는데, 자세히 보면 "registry-mirrors" 옵션. 쉼표(,
) 누락으로 인한 문법 오류를 범했고, 마침내 문제의 원인을 찾았습니다. 🎜🎜수정 후: 🎜rrreee🎜 누르기: wq 오류를 보고하고 종료합니다. 🎜🎜4️⃣ 시스템을 다시 로드하고 Docker 서비스를 다시 시작합니다🎜rrreee🎜5️⃣ docker 버전 정보가 정상적으로 출력되는지 확인합니다🎜rrreeerrreee🎜이 시점에서 Docker 서비스가 성공적으로 다시 시작되고 Pod 노드가 정상으로 돌아가며 Nginx 서비스를 사용할 수 있습니다. 정상적으로 접속되었습니다. 🎜rrreee🎜 포드 상세정보를 확인해보니 디스플레이가 정상이네요. 🎜rrreee🎜🎜🎜
위 내용은 Kubernetes에서 Nginx 서비스 시작 실패 문제를 해결하는 방법의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!