Containerd creates an overlay filesystem mounted in the root volume by default if there is no ephemeral mount point already identified. In this failure condition, containerd fails to restart after the ephemeral mount is completed. This leads to a state where containerd references a filesystem that no longer has reference to the container images it used prior to kernel reboot.
We see a resulting failure to reference the sandbox location in order to start the kube-proxy service, upon which the CNI and all other system containers depend. This causes a condition where kubelet is started and the worker node incorrectly "appears" healthy to the Control Plane nodes, leading to scheduling attempts that cannot be completed.
To apply the workaround:
----------------------------------------------------
Identify the Problem Node:
kubectl get pods -A -o wide | grep -v Run
Restart Containerd on Problem Node:
kubectl get vm -o wide -n <affected cluster's namespace>
sudo su
crictl ps
systemctl restart containerd
crictl ps