TKG VMs Not Provisioned in vSphere - status.ready not found vSphereVM
search cancel

TKG VMs Not Provisioned in vSphere - status.ready not found vSphereVM


Article ID: 342246


Updated On:




  • When provisioning workload clusters on TKG the cluster never goes into ready status.
  • No VMs are being created in vSphere after initiating a workload cluster deploy or after scaling out the workload cluster
  • TKG VMs on existing clusters will not delete or roll with new VMs.
  • No tasks in vCenter are showing for the creation of TKG VMs or template clones.
  • vSphereVM and machine objects in the management cluster context for the workload cluster never get a provider ID.
  • Creation of workload cluster VMs are stuck indefinitely.
  • Cluster stuck in provisioning.
  • CAPV controller logs show similar to the following message repeating over and over:

I0924 21:50:02.070872    1 vimmachine.go:147] "capv-controller-manager/vspheremachine-controller/<CLUSTER-NAMESPACE>/<CLUSTER-NAME>-control-plane-gnx88-htrkg: waiting for ready state" 

I0924 21:50:02.071795    1 vimmachine.go:432] "capv-controller-manager/vspheremachine-controller/tkg-system/<MGMT-CLUSTER-NAME>-md-1-infra-g5jrq-txq7c: updated vm" vm="tkg-system/<MGMT-CLUSTER-NAME>-md-1-9fcbn-65kv57b5b-bdbbf" 

I0924 21:50:02.071883    1 vimmachine.go:432] "capv-controller-manager/vspheremachine-controller/<CLUSTER-NAMESPACE>/<CLUSTER-NAME>-md-1-infra-kgl22-5w2wm: updated vm" vm="<CLUSTER-NAMESPACE>/<CLUSTER-NAME>-md-1-g96v4-998dfc6c7f9-mqh9k" 


VMware Tanzu Kubernetes Grid 1.x
VMware Tanzu Kubernetes Grid 2.2.0
VMware Tanzu Kubernetes Grid 2.1.0


  • This issue can be caused by random network disconnects or temporarily unavailability of vCenter to the TKG Management cluster. This is a known issue with upstream Cluster API Provider vSphere.


  • This issue will be resolved in a future version of TKG that will include the cluster-api-provider-vsphere release with the fix - Public upstream issue
Fix is now included in  v1.5.6 - v1.7.1 - v1.8.0


Restart the capv controller. There should be no impact on existing clusters:

In the TKG Management cluster context run the following to collect the CAPV controller deployment info and record the deployment name and namespace of the CAPV controller:


kubectl get deployments -A | grep capv


Restart the CAPV controller with the following command:


kubectl rollout restart deployment -n vmware-system-capv capv-controller-manager 

kubectl rollout restart deployment -n capv-system capv-controller-manager


Validate the CAPV controller pods are back up in ready status using the following command:


kubectl get pods -A | grep capv


Validate there are now VMs being provisioned or existing VMs are being removed and replaced in vCenter UI

Additional Information - Public upstream issue

  • Unable to provision new workload clusters or new VMs for existing clusters