BOSH Director fails with non-running job during upgrade to Ops Manager 2.10
search cancel

BOSH Director fails with non-running job during upgrade to Ops Manager 2.10

book

Article ID: 293819

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

In Ops Manager 2.10, the metrics-server job was enabled on the BOSH Director. In situations where a cloud config does not contain a networks property, the metrics-server job will fail to start due to validation errors. This issue occurs because metrics-server expects a network field to exist in each cloud-config so all network related metrics can be emitted.

In order for the BOSH Director to successfully complete the Apply Changes, all jobs must be in a running state. If after 5 minutes the jobs are still in a failing state, the Apply Changes will timeout.


Example Error Messages

Apply Changes error:
Updating instance 'bosh/0'... Finished (00:01:45)
Waiting for instance 'bosh/0' to be running... Failed (00:05:04)
Failed deploying (00:14:26)

Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)


Deploying:
Received non-running job state: 'failing'
Exit code 1

Error returned in metrics-server log:
/var/vcap/sys/log/director/metrics_server.stderr.log

D, [2020-08-12T20:48:22.010702 #26225] [] DEBUG -- Director: (0.000475s) (conn: 46940614517920) SELECT * FROM "configs" WHERE ("id" IN (SELECT max("id") FROM "configs" WHERE ("type" = 'cloud') GROUP BY "name"))
Traceback (most recent call last):
	10: from /var/vcap/packages/director/bin/bosh-director-metrics-server:29:in `<main>'
	 9: from /var/vcap/packages/director/bin/bosh-director-metrics-server:29:in `load'
	 8: from /var/vcap/data/packages/director/9183d12c7c614c9758fd7a0b37e1b1f2ba68fbf1/gem_home/ruby/2.6.0/gems/bosh-director-0.0.0/bin/bosh-director-metrics-server:26:in `<top (required)>'
	 7: from /var/vcap/data/packages/director/9183d12c7c614c9758fd7a0b37e1b1f2ba68fbf1/gem_home/ruby/2.6.0/gems/bosh-director-0.0.0/lib/bosh/director/metrics_collector.rb:51:in `start'
	 6: from /var/vcap/data/packages/director/9183d12c7c614c9758fd7a0b37e1b1f2ba68fbf1/gem_home/ruby/2.6.0/gems/bosh-director-0.0.0/lib/bosh/director/metrics_collector.rb:105:in `populate_metrics'
	 5: from /var/vcap/data/packages/director/9183d12c7c614c9758fd7a0b37e1b1f2ba68fbf1/gem_home/ruby/2.6.0/gems/bosh-director-0.0.0/lib/bosh/director/metrics_collector.rb:126:in `populate_network_metrics'
	 4: from /var/vcap/data/packages/director/9183d12c7c614c9758fd7a0b37e1b1f2ba68fbf1/gem_home/ruby/2.6.0/gems/bosh-director-0.0.0/lib/bosh/director/metrics_collector.rb:126:in `map'
	 3: from /var/vcap/data/packages/director/9183d12c7c614c9758fd7a0b37e1b1f2ba68fbf1/gem_home/ruby/2.6.0/gems/bosh-director-0.0.0/lib/bosh/director/metrics_collector.rb:127:in `block in populate_network_metrics'
	 2: from /var/vcap/data/packages/director/9183d12c7c614c9758fd7a0b37e1b1f2ba68fbf1/gem_home/ruby/2.6.0/gems/bosh-director-0.0.0/lib/bosh/director/deployment_plan/cloud_manifest_parser.rb:14:in `parse'
	 1: from /var/vcap/data/packages/director/9183d12c7c614c9758fd7a0b37e1b1f2ba68fbf1/gem_home/ruby/2.6.0/gems/bosh-director-0.0.0/lib/bosh/director/deployment_plan/cloud_manifest_parser.rb:48:in `parse_networks'
/var/vcap/data/packages/director/9183d12c7c614c9758fd7a0b37e1b1f2ba68fbf1/gem_home/ruby/2.6.0/gems/bosh-director-0.0.0/lib/bosh/director/validation_helper.rb:16:in `safe_property': Required property 'networks' was not specified in object ({"vm_extensions"=>[{"cloud_properties"=>{"vmx_options"=>{"disk.enableUUID"=>"1"}}, "name"=>"disk_enable_uuid"}, {"cloud_properties"=>{"upgrade_hw_version"=>true}, "name"=>"set_version_hardware"}]}) (Bosh::Director::ValidationMissingField)

Validation Error returned by metrics-server:
Required property 'networks' was not specified in object ({"vm_extensions"=>[{"cloud_properties"=>{"vmx_options"=>{"disk.enableUUID"=>"1"}}, "name"=>"disk_enable_uuid"}, {"cloud_properties"=>{"upgrade_hw_version"=>true}, "name"=>"set_version_hardware"}]}) (Bosh::Director::ValidationMissingField)


Environment

Product Version: 2.10

Resolution

Prior to upgrading to Ops Manager 2.10, review your foundation's cloud-config files with bosh configs.

There are some product tiles which require a cloud-config file without a networks property defined, such as the Tanzu Kubernetes Grid Integrated Edition (TKGI) tile. If you identify a cloud-config file that does not include networks property, then the upgrade fails unless the metrics-server feature is disabled.

Note: Its best to hold off on the upgrade until a version of Ops Manager is released with BOSH Director version 271.2.0.

How to describe configs:
1. bosh configs #list all of the configs available to the deployment
2. bosh -e my-env config --type=my-type --name=my-name #describes the properties and fields defined in your config


Workaround

If you are in progress of upgrading then you can disable the metrics-server feature by updating the director_configuration.director_metrics_server_enabled property to false using the OM API endpoint: /api/v0/staged/director/properties

Example request to disable metrics-server:
curl -k https://<OPS_MANAGER_FQDN>/api/v0/staged/director/properties \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $token" \
-X PUT \
-d '{"director_configuration": {"director_metrics_server_enabled": "false"}}'


References

  • Cloud Config: https://bosh.io/docs/cli-v2/#configs
  • Ops Manager API Endpoint: https://docs.pivotal.io/platform/2-10/opsman-api/#tag/Properties/paths/~1api~1v0~1staged~1director~1properties/put