Bosh Director upgrade fails with "Waiting for instance 'bosh/0' to be running" error due to missing remove_dns_records_from_instances.rb file
search cancel

Bosh Director upgrade fails with "Waiting for instance 'bosh/0' to be running" error due to missing remove_dns_records_from_instances.rb file

book

Article ID: 404705

calendar_today

Updated On:

Products

VMware Tanzu Application Service

Issue/Introduction

During OpsMan Apply Changes, Bosh Director upgrade fails with error:

"Waiting for instance 'bosh/0' to be running... Failed"

Logging into the Bosh Director VM, it can be seen that several processes are stuck in "initializing", "not monitored" or "Execution failed" status:

bosh/0:~# monit summary
The Monit daemon 5.2.5 uptime: 45d 18h 39m

Process 'nats'                      running
Process 'bosh_nats_sync'            running
Process 'postgres'                  running
Process 'director'                  initializing
Process 'worker_1'                  initializing
Process 'worker_2'                  initializing
Process 'worker_3'                  initializing
Process 'worker_4'                  not monitored
Process 'worker_5'                  not monitored
Process 'director_scheduler'        running
Process 'metrics_server'            Execution failed
Process 'director_sync_dns'         Execution failed
Process 'director_nginx'            running
Process 'health_monitor'            running
Process 'uaa'                       running
Process 'credhub'                   running
Process 'system-metrics-agent'      running
Process 'blobstore_nginx'           running
Process 'count-cores'               running

In particular, metrics_server and director_sync_dns show "Execution failed" status.

/var/vcap/sys/log/director/sync_dns.stderr.log and /var/vcap/sys/log/director/metrics_server.stderr.log show errors as follows:

/var/vcap/data/packages/director/<id>/gem_home/ruby/3.2.0/gems/sequel-5.29.0/lib/sequel/extensions/migration.rb:722:in 'get_applied_migrations': Applied migration files not in file system: 20240319204601_remove_dns_records_from_instances.rb (Sequel::Migrator::Error)

Cause

20240319204601_remove_dns_records_from_instances.rb Ruby script missing in directory /var/vcap/data/packages/director/<id>/gem_home/ruby/3.2.0/gems/bosh-director-0.0.0/db/migrations/director/

It's unclear what may cause the file to be missing there.

Resolution

During the Apply Changes, once the new Bosh Director VM has been recreated and powered on, before "Waiting for instance 'bosh/0' to be running" task times out (5 min), manually recreate the missing Ruby file in the Bosh Director VM.

  1. Login to the new Bosh Director VM as vcap user:
    # ssh vcap@<Bosh-Director-VM_IP>
  2. Change to root user:
    # sudo -i
  3. Recreate the missing file:
    # vim /var/vcap/data/packages/director/<id>/gem_home/ruby/3.2.0/gems/bosh-director-0.0.0/db/migrations/director/20240319204601_remove_dns_records_from_instances.rb

    Sequel.migration do
      down do
        alter_table(:instances) do
          add_column :dns_records, String, text: true
        end
      end

      up do
        alter_table(:instances) do
          drop_column :dns_records
        end
      end
    end

    Note: If the name of the remove_dns_records_from_instances.rb missing file referenced in /var/vcap/sys/log/director/sync_dns.stderr.log and /var/vcap/sys/log/director/metrics_server.stderr.log errors doesn't exactly match 20240319204601_remove_dns_records_from_instances.rb, the above file contents may not be valid as the file version may have changed. Contact Support in this case.

  4. Wait for a minute and check that all processes are now running:
    # monit summary

  5. Verify that Apply Changes in OpsMan succeeds.