Ready,SchedulingDisabled state indefinitely.During graceful shutdown, the internal cluster management service may mark certain nodes for deletion. After power-on, these nodes remain in a cordoned state, preventing the automatic recovery service from completing. The recovery process waits for all nodes to be usable before scaling services back up, creating a deadlock for bring up.
Step 1: Power on the cluster VMs
Note: For Automation, skip to step 5, all Node VMs are Control and Worker, order does not matter.
Step 2: Wait for automatic recovery
Allow 15-20 minutes for the automatic recovery process to complete. The platform includes a systemd service that automatically scales services back to their original replica counts.
Step 3: Validate cluster recovery
Verify the cluster is operational by checking:
Step 4: Troubleshoot if automatic recovery fails
If services do not recover automatically after 20 minutes, manual intervention is required.
Access the cluster using breakglass
vmware-system-userLogin with breakglass password
sudo -iexport KUBECONFIG=/etc/kubernetes/admin.confValidate pod status and service status
kubectl get nodes
kubectl get configmap power-off-marker -n vmsp-platform
kubectl get pods -A | grep -v RunningManual recovery if in bad state
If nodes are stuck in Ready,SchedulingDisabled state and the power-off-marker ConfigMap exists, then you will need to run the following script on that same node for manual recovery after setting the KUBECONFIG variable:
"cluster-manual-recovery.sh"
Step 5: Validate final state
After manual recovery, verify:
kubectl get pods -AFleet Lifecycle Manager UI is accessible (for management clusters)
VCF Automation services are accessible (for automation clusters)