Error "container runtime is down" and Nodes are in not ready state

search cancel

Error "container runtime is down" and Nodes are in not ready state

book

Article ID: 386710

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

On updating the Harbor password (which has special characters having an additional line added), the issue is faced . Workload Cluster lost VIP and Control Plane and Node pool VMs are in not ready state. Containerd service is down for all VMs.

systemctl status containerd

Environment

2.3

Cause

This issue is faced due to the invalid password. "/etc/containerd/config.toml" has new line in password which brought down containerd service. In later version of TCA, checks have been added to block passwords with new line added.

Resolution

Re-Register the harbor in the partner system with the correct password and edit the Harbor addon for that cluster and save changes.
Note: Make sure that the password is not including any extra line in the password.
Restore VIP on control plane

- Manually edit /etc/containerd/config.toml with the correct password .

Note: It could be overwritten by nodeconfig-daemon running on the node. It may help by killing the process returned by "ps -ef|grep nodeconfig-daemon" but it may come up again automatically. User may need to repeat this a couple of times until containerd does not crash any more after the correct password is set on WC.
Run the below command on worker nodes to force nodeconfig-daemon pods to write to "/etc/containerd/config.toml" file :
```
kubectl rollout restart daemon nodeconfig-daemon -n tca-system
```

Feedback

thumb_up Yes

thumb_down No