Harbor pods stuck in ProviderFailed state after ESXi host failure

search cancel

book

calendar_today

VMware vSphere Kubernetes Service

Harbor pods are stuck in an error state within the Supervisor cluster following a hardware failure on an ESXi host.
Affected pods remain in a ProviderFailed state

root@<Supervisor> [ ~ ]# k get pods -n svc-harbor-domain-c# -o wide | grep <hostname>
harbor-core-#### 0/1 ProviderFailed 0 days <none> <hostname> <none> <none>

vSphere Kubernetes Service

The Supervisor cluster is experiencing a scheduling deadlock because the Harbor pod VMs are registered to the failed physical host.
When an ESXi host suffers a sudden hardware failure or disconnects unexpectedly, the control plane cannot cleanly migrate or terminate the pod VMs running on that specific hypervisor. Because vCenter still tracks these objects as registered to the disconnected host, the Supervisor cluster is unable to reschedule them. The pods remain pinned to the failed compute resource because the underlying infrastructure provider is unavailable to execute runtime operations.

Disconnect and remove the failed ESXi host from the vSphere cluster.
Manually unregister any orphaned or stale Harbor pod VMs from the vCenter inventory.
Refer: Manually removing stale or orphaned virtual machines from vCenter Server
Forcefully delete the stuck Harbor pods within the Supervisor cluster.

thumb_up Yes

thumb_down No