Harbor pods stuck in ProviderFailed state after ESXi host failure
book
Article ID: 441489
calendar_today
Updated On:
Products
VMware vSphere Kubernetes Service
Issue/Introduction
Harbor pods are stuck in an error state within the Supervisor cluster following a hardware failure on an ESXi host.
Affected pods remain in a ProviderFailed state
root@<Supervisor> [ ~ ]# k get pods -n svc-harbor-domain-c# -o wide | grep <hostname> harbor-core-#### 0/1 ProviderFailed 0 days <none> <hostname> <none> <none>
Environment
vSphere Kubernetes Service
Cause
The Supervisor cluster is experiencing a scheduling deadlock because the Harbor pod VMs are registered to the failed physical host.
When an ESXi host suffers a sudden hardware failure or disconnects unexpectedly, the control plane cannot cleanly migrate or terminate the pod VMs running on that specific hypervisor. Because vCenter still tracks these objects as registered to the disconnected host, the Supervisor cluster is unable to reschedule them. The pods remain pinned to the failed compute resource because the underlying infrastructure provider is unavailable to execute runtime operations.
Resolution
Disconnect and remove the failed ESXi host from the vSphere cluster.