Bosh resurrector not working after deleting a resurrection config
search cancel

Bosh resurrector not working after deleting a resurrection config

book

Article ID: 313132

calendar_today

Updated On:

Products

VMware VMware vSphere with Tanzu

Issue/Introduction

How to enable/disable resurrection to specific deployment.

Symptoms:
When disabling resurrection to a specific deployment on BOSH director using a resurrection config file as per below example. 
bosh update-config --type resurrection --name service-instance_e6cf411c-1446-45de-ad8b-8173979e4377 /home/ubuntu/resurrection.yml
Using environment '172.30.0.11' as client 'ops_manager'
+ rules:
+ - enabled: false
+   include:
+     deployments:
+     - service-instance_e6cf411c-1446-45de-ad8b-8173979e4377

Resurrection will not kick in after deleting the resurrection config file as long as there is only 1 resurrection config left, even if global resurrection is enabled as well as the Resurrection plugin is turned ON on the BOSH director tile.

Environment

VMware Tanzu Kubernetes Grid Integrated Edition 1.x
Tanzu Kubernetes Grid Integrated Edition 1.1.13.4

Cause

When deleting the last resurrection config file, so basically going from 1 resurrection config to 0, then the config resurrection file is deleted, however the array length is not updated as per below. 
 
number of bosh configs of type resurrection == 1
parsed rules array length == 1

run bosh delete-config

number of bosh configs of type resurrection == 0
parsed rules array length == 1

Effectively this issue will cache the contents of the last rule. In a scenario where the global config is ON and the contents of the last rule were used to override that value for deployment XYZ, even after deleting the rule, deployment XYZ would have the resurrection switched off because the rule in health manager was never deleted even if the bosh config did not exists anymore.

More details at --> https://github.com/cloudfoundry/bosh/pull/2393

Resolution

A fix is in progress for the upcoming BOSH director version.

Workaround:
Workaround is to create one bosh config that targets a non existing deployment.
 
rules:
- enabled: false
   include:
      deployments:
      - "NOT EXISTING"

That will make sure that all the other rules are cleaned up properly.

Additional Information

Bosh resurrector plugin is a tool used to recreate the vm if it is not in running state. More details available at https://bosh.io/docs/resurrector/

Bosh has an agent running on each vm, if the agent does not respond to a heart beat every minute, bosh will mark vm unresponsive
And will try to recreate it. If one of vm processes is failing, the vm will be marked as falling state and resurrector will try to fix it by recreating the vm. 


Impact/Risks:
Resurrection not recreating any VMs for the particular deployment that had the resurrection config applied.