One of the three VMware Identity Broker (vIDB) nodes got deleted due to an outage and VMware Services Platform (VMSP) should auto re-deploy the lost node but the same is not happening. Services Platform (VMSP) is unable to auto-redeploy the missing node.
Output of kubectl get pod -n vidb-external shows the pod status as Pending:
| root@[<VMSP_Node>] # kubectl get pod -n vidb-external | ||||
| NAME | READY | STATUS | RESTARTS | AGE |
| vidb-postgres-instance-0 | 2/2 | Running | 14 (5d ago) | 51d |
| vidb-postgres-instance-1 | 0/2 | Pending | 0 | 17s |
| vidb-postgres-instance-2 | 2/2 | Running | 12 (4d10h ago) | 32d |
| vidb-service-pod1 | 0/1 | Pending | 0 | 30d |
| vidb-service-pod2 | 1/1 | Running | 12 (4d10h ago) | 32d |
| vidb-service-pod3 | 1/1 | Running | 20 (16h ago) | 37d |
Output of kubectl get nodes shows only two nodes:
| root@[<VMSP_Node>] # kubectl get nodes | ||||
| NAME | STATUS | ROLES | AGE | VERSION |
| <VMSP_Node1> | Ready | control-plane | 51d | v1.32.0+vmware.1-fips |
| <VMSP_Node2> | Ready | control-plane | 51d | v1.32.0+vmware.1-fips |
Output of kubectl get machines,vspheremachines -A shows the node as Provisioning:
| root@[<VMSP_Node>] # kubectl get machines,vspheremachines -A | |||||||
| NAMESPACE | VERSION | NAME | CLUSTER | NODENAME | PROVIDERID | PHASE | AGE |
| vmsp-platform | v1.32.0 | machine.cluster.### | vcf-mgmt-### | <Node1> | vsphere://### | Running | 51d |
| vmsp-platform | v1.32.0 | machine.cluster.### | vcf-mgmt-### | <Node2> | vsphere://### | Provisioning | 9m1sec |
| vmsp-platform | v1.32.0 | machine.cluster.### | vcf-mgmt-### | <Node3> | vsphere://### | Running | 51d |
Output of kubectl logs capv-controller-manager-xxxx -n vmsp-platform | vim - shows VM template is missing from the data store while VMSP platform is trying to clone the VM:
E0401 <TimeStamp> l controller.go:316] "Reconciler error" err="failed to reconcile VM: unable to find template by name \"/<Datastore_Name>/vm/vcf-services-platform-template-9.0.1.0.24940697\": vm '/<Datastore_Name>/vm/vcf-services-platform-template-9.0.1.0.24940697' not found" controller="vspherevm" controllerGroup="infrastructure. cluster.x-k8s.io" controllerKind="VSphereVM" VSphereVM="vmsp-platform/<Node_Name>" namespace="vmsp-platform" name="<Node_Name>" reconcileID="########-####-####-####-###########"VCF Operations 9.0.x
VM template is missing from the datastore, preventing the new node from being provisioned.
To resolve this issue, clone the template from the datastore where this template is available to the data store where this is required(location indicated by logs) and the new node will get provisioned.