Symptoms:
- You see that bosh cluster creation tasks fail with prestart script messages similar to the following:
Task 1622 | 17:15:17 | Updating instance master: master/61c97315-7d6a-40fb-9ff9-69cb76bb776e (0) (canary) (00:02:10) L Error: Action Failed get_task: Task d22e5fef-ee12-4951-58fa-bdbf205a42e2 result: 2 of 7 pre-start scripts failed. Failed Jobs: pks-nsx-t-prepare-master-vm, pks-nsx-t-ncp. Successful Jobs: etcd, bpm, bosh-dns, syslog_forwarder, ncp. Task 1622 | 17:17:27 | Error: Action Failed get_task: Task d22e5fef-ee12-4951-58fa-bdbf205a42e2 result: 2 of 7 pre-start scripts failed. Failed Jobs: pks-nsx-t-prepare-master-vm, pks-nsx-t-ncp. Successful Jobs: etcd, bpm, bosh-dns, syslog_forwarder, ncp.
- While accessing the cluster, you see that a master or worker has failed to start
- After gathering bosh deployments logs,you see messages similar to the following in /<service-instanceID>/pks-nsx-t-prepare-master-vm/pre-start.stdout.log:
Registering client certificate Get https://<NSX-MANAGER-FQDN>/api/v1/trust-management/principal-identities: dial tcp: lookup <NSX-MANAGER-FQDN> on <DNS-SERVER-IP>:53: read udp <BOSH-VM-IP>:59503-><DNS-SERVER-IP:53: i/o timeout
- You see messages similar to the following in the deployment logs under /<service-instanceID>/bosh-dns/bosh_dns.stdout.log:
[FailoverRecursor] 2020/08/21 19:19:27 INFO - shifting recursor preference: <DNS-SERVER-IP>
[ForwardHandler] 2020/08/21 19:19:27 DEBUG - error recursing to "<DNS-SERVER-IP>:53": read udp <BOSH-VM-IP>:59882-><DNS-SERVER-IP>:53: i/o timeout
[FailoverRecursor] 2020/08/21 19:19:27 INFO - shifting recursor preference: <DNS-SERVER-IP>:53
[ForwardHandler] 2020/08/21 19:19:29 DEBUG - error recursing to "<DNS-SERVER-IP>": read udp <BOSH-VM-IP>:54326-><DNS-SERVER-IP>: i/o timeout
[ForwardHandler] 2020/08/21 19:19:29 INFO - handlers.ForwardHandler Request [1] [<DNS-SERVER-FQDN>.] 2 [no response from recursors] 4000914000ns
[ForwardHandler] 2020/08/21 19:19:29 DEBUG - error recursing to "<DNS-SERVER-IP>": read udp <BOSH-VM-IP>:43965-><DNS-SERVER-IP>: i/o timeout
[ForwardHandler] 2020/08/21 19:19:29 INFO - handlers.ForwardHandler Request [28] [<DNS-SERVER-FQDN>.] 2 [no response from recursors] 4001378000ns
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.