BOSH Recreate Tasks Queueing and Failing with Snapshot Missing Errors
search cancel

BOSH Recreate Tasks Queueing and Failing with Snapshot Missing Errors

book

Article ID: 293521

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

Symptoms:

BOSH recreate or resurrection tasks kept queuing up. BOSH tries to recreate the missing VM and fails with Stemcell snapshot missing errors.

This error can be seen in the bosh task <#> --debug output:

E, [2018-02-02 11:00:09 #10383] [task:32342] ERROR -- DirectorJobRunner: Error resolving problem '3876':
 Unknown CPI error 'Unknown' with message 'The object[s] '<[Vim.VirtualMachine] vm-7068>'
 should have the following properties: ["snapshot"]
, but they were missing these: #<Set: {"snapshot"}>
.' in 'create_vm' CPI method

Environment


Cause

BOSH deployments have multiple stemcell VMs. In a vSphere environment, BOSH will create a vSphere snapshot of the stemcell VM with the name "initial". Then BOSH will use this snapshot to clone new VMs for use in a deployment.

If the snapshot on the stemcell is deleted or missing, BOSH won't be able to create any new VMs using the stemcell. This can cause a backlog in the BOSH tasks if it is trying to resurrect a VM because the task will take few minutes to fail or timeout while other tasks are already queuing up.

Resolution

  • Check the number of BOSH tasks: bosh tasks --no-filter
  • Check for the processing task and debug it to confirm the error: bosh task <#> --debug
  • Check the IaaS console for the stemcell VM used here.
  • For VMWare, vCenter web client can be used. Open the PCF Templates Folder to find the stemcell VM.
  • Find a stemcell that is having issues by running bosh stemcells and find this stemcell in PCF Templates folder. Click on stemcell vm and then on Snapshots tab. We should see a snapshot named "initial" for the stemcell VM in this tab.
  • If the above snapshot is not present, follow steps in this kb BOSH Director fails when trying to create a VM because the snapshot is missing. However, as there are too many tasks queued, it may not be possible to run any more tasks.

    Workaround

    In vSphere vCenter console, create the snapshot manually, naming the snapshot "initial".

    • Log in to vCenter and find the stemcell VM under PCF Templates Folder.
    • Right-click the stemcell VM and create a new snapshot.
    • Type the name as initial.
    • Re-run Apply Changes to recreate the vm.