bosh deployment fails with cpi error Could not find VM for stemcell
search cancel

bosh deployment fails with cpi error Could not find VM for stemcell

book

Article ID: 407052

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition VMware Tanzu Platform VMware Tanzu Platform - Cloud Foundry

Issue/Introduction

Below is an example of the bosh director failing to create a VM because of a missing vsphere stemcell.  The BOSH director creates a VM in vSphere when the stemcell is first uploaded to the director.  This VM will have a name like 'sc-GUID', be in a powered-off state, and have a single initial snapshot.  There will be one stemcell VM for each AZ or isolated datastore based on the configuration of the BOSH director tile.   In some cases one or more versions of the stemcell VM will get deleted or corrupted.

pivotal-container-service-GUID  service-instance_GUID  run errand apply-addons from deployment service-instance_GUID  Unknown CPI error 'Unknown' with message 'Could not find VM for stemcell 'sc-GUID''

 

Note: If you see this error when Operations Manager fails to create the Bosh director VM then follow KB https://knowledge.broadcom.com/external/article?articleNumber=293857 instead of this one to resolve.

 

Resolution

To work around this, we can simply upload the stemcell again using the "bosh" CLI from the Operations Manager VM.

In this example we will be recovering a failed attempt to create a VM with BOSH instance name "worker/GUID".  First we need to identify what stemcell version to recover.

Run bosh vms command:

Instance           Process State  AZ     IPs           VM CID       VM Type          Active    Stemcell
worker/GUID     failing                az1    x.x.x.x     vm-GUID    medium.mem  true        bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.844

In the output above we can see that jammy stemcell 1.844 is used by the worker instance.  So we will re-upload that stemcell by first copying it from its saved folder to /tmp/.  By default Operations Manager only allows the root user to read the saved stemcell; however, if we copy it, we can set the permissions we need and re-upload the stemcell manually. If for some reason you cannot find the stemcell on opsman, then you will have to download it from https://support.broadcom.com 

# ls -l /var/tempest/stemcells/
total 2657540
-rw-rw-r-- 1 tempest-web tempest-web 1356535631 Apr  7 06:54 bosh-vsphere-esxi-ubuntu-jammy-go_agent-1.808.tgz
-rw------- 1 tempest-web tempest-web 1364773656 Jul 14 20:39 bosh-vsphere-esxi-ubuntu-jammy-go_agent-1.844.tgz


sudo cp /var/tempest/stemcells/bosh-vsphere-esxi-ubuntu-jammy-go_agent-1.844.tgz /tmp/
sudo chmod 777 /tmp/bosh-vsphere-esxi-ubuntu-jammy-go_agent-1.844.tgz


bosh upload-stemcell /tmp/bosh-vsphere-esxi-ubuntu-jammy-go_agent-1.844.tgz --fix

Once stemcell upload fix is done the error should be rectified, click on "Apply Changes" in the Ops Manager Web UI to continue deploying your changes.