When enabling Supervisor with AVI load balancer and NSX, the process gets stuck at "Configured Supervisor Control plane VM's Workload Network".
In the Supervisor nodes, the workload network interface (eth1) is not configured. As a result, 1 out of 3 CoreDNS pods is in CrashLoopBackOff, and the other 2 are Pending.
# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-84787595bc-88gqg 0/1 CrashLoopBackOff 21 (2m55s ago) 71m
kube-system coredns-84787595bc-ntpqv 0/1 Pending 0 71m
kube-system coredns-84787595bc-wf7bt 0/1 Pending 0 71m
READY column value is empty.
# kubectl -n kube-system get virtualnetworkinterfaces
NAME READY AGE
vif-vm-1038 70m
vif-vm-1039 65m
vif-vm-1040 65m
One nsx-ncp pod is Running in the FIP Supervisor node, but it does not produce any useful error messages.
# kubectl -n vmware-system-nsx get pods
NAME READY STATUS RESTARTS AGE
nsx-ncp-7b47d9cf67-cmfv9 2/2 Running 3 (93m ago) 106m
nsx-ncp-cb55775b5-ctjpl 0/2 Pending 0 3s
vSphere Kubernetes Service - vSphere 8U3
NSX Networking Stack with AVI LoadBalancer configured
The nsx-ncp pod entered Restore mode and did not return to Normal mode, which caused workload network segment creation to stop.
This issue occurs in environments that use both AVI and NSX, if the NSX Manager has been restored previously.
Switch the nsx-ncp pod from Restore mode to Normal mode.
1. Log in to the Supervisor node via SSH
Follow the KB: Troubleshooting vSphere Supervisor Control Plane VMs
2. Get restored_end_time from NSX Manager via API
NSX_FQDN=<NSX_MANAGER_FQDN>
NSX_PASS=<NSX_MANAGER_PASSWORD>
RESTORE_END_TIME=$(curl -ks -u admin:"${NSX_PASS}" -X GET https://${NSX_FQDN}/api/v1/cluster/restore/status | jq -r .restore_end_time)
echo $RESTORE_END_TIME
#> 1754697444409
3. Patch nsx-restore-status
kubectl patch ncpconfig nsx-restore-status --type='merge' -p "{\"metadata\":{\"annotations\":{\"restore_end_time\":\"${RESTORE_END_TIME}\"}}}"
4. Restart the nsx-ncp pod manually
kubectl -n vmware-system-nsx delete pod nsx-ncp-xxxxxxxxx
# Check
kubectl -n vmware-system-nsx get pods
#> NAME READY STATUS RESTARTS AGE
#> nsx-ncp-cb55775b5-np2tg 0/2 Pending 0 7s
#> nsx-ncp-cb55775b5-vxj9v 2/2 Running 0 44s
4. Workload Network IF will be created successfully. The Supervisor Enablement process will also be resumed.
kubectl get virtualnetworkinterfaces -n kube-system
#> NAME READY AGE
#> vif-vm-1038 True 179m
#> vif-vm-1039 True 175m
#> vif-vm-1040 True 175m