Apply Changes on the Bosh Director fails with "Waiting for instance 'bosh/0' to be running… Failed"
search cancel

Apply Changes on the Bosh Director fails with "Waiting for instance 'bosh/0' to be running… Failed"

book

Article ID: 293404

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

Apply Changes on the BOSH Director fails with the following:
Waiting for instance 'bosh/0' to be running… Failed
In the BOSH Director, monit summary shows the following: 
Process 'uaa' running
Process 'credhub' Execution failed
If we take a look in /var/vcap/sys/logs we can see the following:
credhub.stdout.log
[2019-10-29 11:26:24+0000] Could not reach the UAA server
After reviewing the code we can see that this is as a result of the health check that credhub sends to the UAA over the name/IP: https://github.com/pivotal/credhub-release/blob/d132fff123b823c3214f3106b3d222648dfd957c/jobs/credhub/templates/wait_for_uaa.erb#L7

Resolution

In order to confirm the issue you can check both the localhost and the IP of the BOSH Director to rule out if the issue is with the UAA / Credhub or outside of the BOSH Director VM. This can be done using the following commands which we have modified for the purpose of this test from the code snipped mentioned above: 
curl --max-time 5 --connect-timeout 2 https://localhost:8443/healthz -k
curl --max-time 5 --connect-timeout 2 https://IP:8443/healthz -k
If the above commands give you an output similar to the one below, Then this there is an issue outside of the BOSH Director: 
curl --max-time 5 --connect-timeout 2 https://localhost:8443/healthz -k
ok
curl --max-time 5 --connect-timeout 2 https://IP:8443/healthz -k
curl: (28) Operation timed out after 5001 milliseconds with 0 bytes received
To scope down the issue further you can also perform the following test:
  • Add a test entry such as, <ip bosh director> test to the local /etc/hosts file of the BOSH Director. If CredHub starts along with all other processes in the BOSH Director this confirms issue with DNS. The curl request mentioned above to the IP address if the BOSH Director will also succeed. 
    curl --max-time 5 --connect-timeout 2 https://IP:8443/healthz -k
    ok
  • If you remove the entry again from the local file, which switches over to network DNS, the issue will return. 
It is possible the problem is miss-configured or not configured Reverse DNS zones in the your environment. To verify this you can run nslookup to the IP address of the BOSH Director from another appliance with access to the same DNS server. You will probably observe a timeout or an error.

Based on the results of the troubleshooting steps above the miss-configuration in the environment has to be resolved. 

Possible causes can be:
  • Wrong DNS server
  • Firewall blocking the communication between BOSH Director and DNS server
  • No routing between BOSH Director and the DNS server
  • Missing Reverse DNS zone in the DNS server configuration
  • Other networking issue preventing correct DNS functionality