During OpsMan Apply Changes, Bosh Director upgrade fails with error:
"Waiting for instance 'bosh/0' to be running... Failed"
Logging into the Bosh Director VM, it can be seen that several processes are stuck in "initializing", "not monitored" or "Execution failed" status:
bosh/0:~# monit summaryThe Monit daemon 5.2.5 uptime: 45d 18h 39m
Process 'nats' runningProcess 'bosh_nats_sync' runningProcess 'postgres' runningProcess 'director' initializingProcess 'worker_1' initializingProcess 'worker_2' initializingProcess 'worker_3' initializingProcess 'worker_4' not monitoredProcess 'worker_5' not monitoredProcess 'director_scheduler' runningProcess 'metrics_server' Execution failedProcess 'director_sync_dns' Execution failedProcess 'director_nginx' runningProcess 'health_monitor' runningProcess 'uaa' runningProcess 'credhub' runningProcess 'system-metrics-agent' runningProcess 'blobstore_nginx' runningProcess 'count-cores' running
In particular, metrics_server and director_sync_dns show "Execution failed" status.
/var/vcap/sys/log/director/sync_dns.stderr.log and /var/vcap/sys/log/director/metrics_server.stderr.log show errors as follows:
/var/vcap/data/packages/director/<id>/gem_home/ruby/3.2.0/gems/sequel-5.29.0/lib/sequel/extensions/migration.rb:722:in 'get_applied_migrations': Applied migration files not in file system: 20240319204601_remove_dns_records_from_instances.rb (Sequel::Migrator::Error)
20240319204601_remove_dns_records_from_instances.rb Ruby script missing in directory /var/vcap/data/packages/director/<id>/gem_home/ruby/3.2.0/gems/bosh-director-0.0.0/db/migrations/director/
It's unclear what may cause the file to be missing there.
During the Apply Changes, once the new Bosh Director VM has been recreated and powered on, before "Waiting for instance 'bosh/0' to be running" task times out (5 min), manually recreate the missing Ruby file in the Bosh Director VM.
# ssh vcap@<Bosh-Director-VM_IP># sudo -i# vim /var/vcap/data/packages/director/<id>/gem_home/ruby/3.2.0/gems/bosh-director-0.0.0/db/migrations/director/20240319204601_remove_dns_records_from_instances.rbSequel.migration do down do alter_table(:instances) do add_column :dns_records, String, text: true end end
up do alter_table(:instances) do drop_column :dns_records end endend
Note: If the name of the remove_dns_records_from_instances.rb missing file referenced in /var/vcap/sys/log/director/sync_dns.stderr.log and /var/vcap/sys/log/director/metrics_server.stderr.log errors doesn't exactly match 20240319204601_remove_dns_records_from_instances.rb, the above file contents may not be valid as the file version may have changed. Contact Support in this case.
# monit summary