The system fails to delete legacy stemcells during environment cleanup or "Applying Changes," resulting in an "Invalid virtual machine state" CPI error when the BOSH Director's database loses synchronization with vCenter.
Product: VMware Tanzu Platform Core (formerly TAS/Tanzu Application Service)
Component: BOSH Director, vSphere CPI
Infrastructure: VMware vSphere / ESXi
Stemcell Version: 1.906, 1.999 (Ubuntu Jammy)
Product Release: 3.1.66
The issue is typically caused by a mismatch between the BOSH Director's state database and the actual state of the virtual machine (stemcell) in vCenter. This can occur if a stemcell becomes corrupted, has missing disks in the underlying infrastructure, or is in an "Invalid virtual machine state" that prevents the Cloud Provider Interface (CPI) from executing standard deletion tasks.
To resolve this issue, follow these steps to force the deletion of the stemcell and manually clean up the BOSH state if necessary.
1: Force Delete the Stemcell via BOSH CLI Attempt to force the deletion of the problematic stemcell to bypass standard CPI checks.
bosh -e <environment_name> delete-stemcell <stemcell_name>/<version> --force
Example: bosh -e my-env delete-stemcell bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.906 --force
2: Manually Clean Up BOSH State File (If "Applying Changes" Continues to Fail) If the "Applying Changes" process in Ops Manager fails during the "Cleaning up BOSH director" step with a CPI error, you may need to manually remove the reference from the state file.
SSH into the Ops Manager VM.
Back up the current BOSH state file:
cp /var/tempest/workspaces/default/deployments/bosh-state.json ~/bosh-state.json.bak
3. Edit the state file using a text editor (e.g., vim):
vim /var/tempest/workspaces/default/deployments/bosh-state.json
4. Locate the section containing the problematic stemcell CID and remove the entry.
5. Save the file and exit.
6. Return to the Ops Manager Installation Dashboard and click Apply Changes.
Ensure that no active deployments are using the stemcell version you are attempting to delete.
Verify in vCenter that the stemcell template does not have "orphaned" or "inaccessible" disk warnings