Multiple pods began to crash on specific node(s) due to liveness/readiness probe errors.
Issue details:
Namespace events may show connection refused to pods on specific worker node:
$ kubectl get events
Warning Unhealthy pod/ Liveness probe failed: Get "http://:8096/healthz": dial tcp :8096: connect: connection refused
Warning BackOff pod/ Back-off restarting failed container
Kubelet shows no such container:
failed: rpc error: code = Unknown desc = Error: No such container:
And if you are using Prometheus, you may see errors like:
containerDataToContainerInfo: unable to find data in memory cache]
Couldn't get containers: partial failures: ["/docker/": containerDataToContainerInfo: unable to find data in memory cache]