Contour supervisor service pods fail to come up and remain in a CrashLoopBackOff (CLBO) or Not Ready state on a VKS Supervisor cluster using NSX-T.
When describing the Contour pods, readiness and liveness probe failures are observed with connection refused errors on ports 8000 and 8001.
Example errors:
readiness probe failed for container contour:
dial tcp <pod-ip>:8001: connection refused
liveness probe failed for container contour:
GET http://<pod-ip>:8000/healthz: connection refused
vSphere Kubernetes Service
NSX-T
The issue is caused by the NSX Distributed Load Balancer (DLB) being in a Degraded state.
Contour supervisor service relies on NSX-T load balancer services to route traffic to the Contour pods on ports 8000 (liveness) and 8001 (readiness). When the Distributed Load Balancer is degraded:
Virtual servers backing the Contour service are not fully functional
Traffic is not correctly forwarded to the Contour pod IPs
Kubernetes readiness and liveness probes fail with connection refused
As a result, Contour pods never transition to a healthy Running state
This is not an issue with the Contour pod itself but with the underlying NSX-T load balancer infrastructure.
Resolve the degraded state of the NSX Distributed Load Balancer backing the Supervisor Services.
Log in to the NSX-T Manager UI.
Navigate to Networking > Load Balancers > Distributed Load Balancer.
Identify the DLB showing a Degraded alarm.
Investigate and remediate the underlying cause (for example, pool member issues, configuration errors, or service failures).
Once the DLB returns to a healthy state, the Contour pods will automatically pass their readiness and liveness probes and transition to the Running state.
For detailed steps to troubleshoot and resolve the Distributed Load Balancer degraded alarm, refer to the following KB article:
NSX Distributed Load Balancer shows Degraded alarm
https://knowledge.broadcom.com/external/article/420132/nsx-distributed-load-balancer-shows-degr.html
Notes: