Remediating state following a storage or network outage.status.conditions shows WorkersAvailable as False and ControlPlaneMachinesReady as False.Deleting phase."Reconciler error" err="failed to remove etcd member for deleting Machine ... cluster has fewer than 2 control plane nodes; removing an etcd member is not supported"VMware vSphere Kubernetes Service
The KCP controller includes a safety guardrail that prevents removing an etcd member if the cluster size drops below 2 nodes. If multiple control plane nodes fail simultaneously (due to infrastructure outage) and are manually repaired or replaced, KCP detects an inventory mismatch. It refuses to delete the "phantom" Machine object because it cannot safely execute the etcd member remove command against a non-existent quorum, resulting in a reconciliation loop.
This is a known issue. A permanent fix is scheduled for VKS 3.7.
Workaround: To break the reconciliation loop, you must trick KCP into believing the node exists so it can proceed with the deletion logic.
Identify the Node Name: On the Supervisor Cluster, identify the nodeRef for the Machine stuck in Deleting status:
kubectl get machine <machine-name> -n <namespace> -o jsonpath='{.status.nodeRef.name}'
Create a Dummy Node Object: On the Guest Cluster, create a temporary local Node object using the name retrieved in Step 1.
apiVersion: v1
kind: Node
metadata:
labels:
node-role.kubernetes.io/control-plane: ""
name: <node-name-from-step-1>
spec: {}
Apply this via kubectl apply -f dummy-node.yaml.
Monitor Deletion: Once the dummy node is created, the KCP controller attempts to reconcile and proceed with the machine object deletion and etcd member removal logic. Once the Machine object is gone from the Supervisor, KCP automatically scales up new control plane nodes to meet the desired replica count.