When creating Workload Cluster nodes in a vSphere Supervisor environment with NSX-T, the new nodes never reach Running state.
This can occur during rolling redeployments or workload cluster upgrades which also operate on rolling redeployment logic of new node first before cleaning up the older node.
While connected to the Supervisor Cluster context, the following symptoms are observed:
kubectl get machine -n <workload cluster namespace>
kubectl get pods -n vmware-system-nsx
While connected to the Workload Cluster context, the following symptoms are observed:
"node.kubernetes.io/network-unavailable"
There are no alarms in the NSX-T web UI regarding NCP health.
vSphere Supervisor 7.X
vSphere Supervisor 8.X
After a rolling redeployment of nodes in a Workload Cluster within a vSphere Supervisor environment using NSX-T, one or more subnets are leftover.
A fix for the vSphere CPI issue of not properly sending deletion requests for the subnet from the IPPool object will be available in an upcoming VKS service Supervisor Service.
The stale subnet entries can be manually cleaned up from the IPPool object in the Supervisor cluster context.
Once removed from the IPPool object, the clean up will propagate to the NSX side.
kubectl get ippool -n <workload cluster namespace>
kubectl describe ippool -n <workload cluster namespace> <ippool name>
kubectl get machines -n <workload cluster namespace>
kubectl edit ippool -n <workload cluster namespace> <ippool name>
spec:
subnets:
- ipFamily: ipv4
name: <missing node name>
prefixLength: 24
- ipFamily: ipv4
name: <existing node name>
prefixLength: 24