Remove stale service records from old pods

Products

VMware Integrated OpenStack

Issue/Introduction

In some scenarios like scaling out, the pod of service is deleted and new pods are created. So the service records from old pods become stale.

nova scheduler is down

nova-ospi logs also reports service down, for example,
nova-osapi/0.log
2020-01-23T13:02:05Z 2020-01-23 13:02:05.429 18 DEBUG nova.servicegroup.drivers.db [req-########-####-####-####-############ ###### ###### - default default] Seems service nova-scheduler on host nova-scheduler-####-1 is down. Last heartbeat was 2019-11-28 11:13:32. Elapsed time is 4844913.42971 is_up /usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py:80

Environment

7.x

Resolution

Setup a cron job that will be run hourly to check if stale services exists for nova.

ssh into the manager node.
Get the list of resources.

viocli get nova

root@oms [ /var/log ]# viocli get nova
NAME CREATION DATE VALIDATION
nova1 2020-02-20 18:31:05 Success

Update the resource with name from above and set the manifest parameter "cron_job_service_cleaner" to true:

viocli update nova nova1

conf:
  nova:
    neutron:
      metadata_proxy_shared_secret: .Secret:managedencryptedpasswords:data.metadata_proxy_shared_secret
    vmware:
      passthrough: "false"
      tenant_vdc: "false"
manifests: <-----
  cron_job_service_cleaner: true <-----

Save the file.

Note: After approximately one hour, the cronjob removes all services that were in a "down" state.