Numerous pods show Liveness and Readiness probe failed due to "connect: connection refused" and Kubelet showing "Error: No such container:"
search cancel

Numerous pods show Liveness and Readiness probe failed due to "connect: connection refused" and Kubelet showing "Error: No such container:"

book

Article ID: 298675

calendar_today

Updated On: 04-22-2024

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Multiple pods began to crash on specific node(s) due to liveness/readiness probe errors.

Issue details:
Namespace events may show connection refused to pods on specific worker node:
$ kubectl get events 

Warning Unhealthy pod/ Liveness probe failed: Get "http://:8096/healthz": dial tcp :8096: connect: connection refused

Warning BackOff pod/ Back-off restarting failed container



Kubelet shows no such container:
failed: rpc error: code = Unknown desc = Error: No such container:

And if you are using Prometheus, you may see errors like:
containerDataToContainerInfo: unable to find data in memory cache]

Couldn't get containers: partial failures: ["/docker/": containerDataToContainerInfo: unable to find data in memory cache]


Environment

Product Version: 1.10

Resolution

Recommendation:
These errors may be pointing to a the node running out of resources during certain times and these pods are crashing as a result. And/or these pods could be a source of resource bottleneck.