vSphere Kubernetes Service (VKS) 3.6 release introduced the capability to propagate certain node configurations in-place without triggering a node rollout for the Cluster. These include associating a new container registry to the Supervisor, creating a ClusterDomainResolutionEntry custom resource to ensure a DNS is resolvable from the Cluster nodes and updating the trusted certificate configuration for the Cluster via Cluster variables.
Based on the trigger that causes in-place updates, this issue can be seen on a single cluster or all VKS clusters managed by the supervisor. For example, when a new container registry is associated to the Supervisor, the trust configuration is pushed to all VKS Clusters managed by the Supervisor whereas updating the Cluster’s trust configuration via the {{osConfiguration}} variable would limit the update to a single cluster.
Either of these operations may result in some of the nodes in the Cluster to move to a NotReady state.
Confirm whether the Cluster(s) is demonstrating the symptom by checking the following:
clusterregistryconfig, clusterdomainresolutionentry, registryconfig objects in the Supervisor API server.kubectl get pods -A --field-selector spec.nodeName=<name-of-not-ready-node>error bind: address already in use" vCenter version: 9.1.0
VKS versions: 3.6.0 and 3.6.1
When the node configuration is propagated in-place (without triggering a rollout), the config of the node is hot replaced by a pod running on the node. This causes the containerd systemd service on the node to restart. If containerd restarts without registering the completion of this pod, the pod ends up being rescheduled in a loop which might cause the service to be frequently restarted.
As a side effect, the containerd process might lose track of the running containers which could cause those containers to be orphaned.
This might block containerd to restart a new container since the port would be already in use by the orphaned container.
Another side effect could be that this loop might cause containerd to undergo multiple restarts within a short period of time which would hit the start limits of systemd on the node, thereby causing containerd service to become unmanaged.
VKS Clusters have automatic remediation setup via machine health checks. Some instances of the issue might be automatically resolved by these remediations. For Clusters who have their ETCD quorum broken (due to >1 failed control plane nodes), auto remediation is not attempted to maintain the integrity of ETCD of the cluster. Similarly, for Cluster nodes whose containerd service is not responding, an attempted remediation will be blocked since the node drain would fail. Machine objects corresponding to the nodes displaying any of the symptoms stuck in the Deleting state for more than an hour are an example of a failed remediation being blocked by a unresponsive container runtime.
Since the container runtime of the node is unresponsive, manual intervention is needed either to move the node back to Ready state or unblock an in-progress automatic remediation.
systemctl status containerd. sudo systemctl reset-failed containerd .sudo systemctl start containerd .systemctl status containerd to verify a successful restart of the service.Since the pod cannot be started on the node due to an orphaned process already running, the process needs to be identified and killed manually to ensure containerd can successfully restart the pod.
crictl ps -a command.crictl logs <container-id> . sudo ss -tulpn | grep :<port-number> .ps -fp <PID> and matching it against the failed pod.sudo kill -9 <PID> . (containerd should eventually be able to replace the pod with a healthy instance.)The table below shows the ports in use by the system pods on a VKS cluster
| Pod Name | Port(s) at Risk |
| kube-apiserver | 6443 |
| etcd | 2379, 2380, 2381 |
| kube-scheduler | 10259 |
| kube-controller-manager | 10257 |
| antrea-agent | 10350 |
| antrea-controller | 10349 |