Symptoms:
Cluster creation fails with the following message in Bosh task logs:
Task 332 | 18:58:48 | Updating instance worker: worker/########-cf83-####-a9cc-############ (0) (canary) (00:02:23)
L Error: Action Failed get_task: Task f2b11c6c-ab26-4dbc-7832-dff69df0a84d result: 2 of 6 pre-start scripts failed. Failed Jobs: ncp, kubelet. Successful Jobs: bosh-dns-enable, syslog_forwarder, bosh-dns, nsx-pod-networking.
Task 332 | 19:01:11 | Error: Action Failed get_task: Task f2b11c6c-ab26-4dbc-7832-dff69df0a84d result: 2 of 6 pre-start scripts failed. Failed Jobs: ncp, kubelet. Successful Jobs: bosh-dns-enable, syslog_forwarder, bosh-dns, nsx-pod-networking.
Task 332 Started Fri Apr 6 18:54:29 UTC 2018
Task 332 Finished Fri Apr 6 19:01:11 UTC 2018
Task 332 Duration 00:06:42
Task 332 error
Capturing task '332' output:
Expected task '332' to succeed but state is 'error'
Exit code 1
TKGI ncp job fails with the below error message in /var/vcap/sys/log/ncp/pre-start.stderr.log
cat pre-start.stderr.log curl: (6) Could not resolve host: test-mgr.domain.com
While configuring T1 logical router for TKGI service vms, SNAT rules are configured as described here. These SNAT rules help Kubernetes vms to communicate with NSX-T manager and other infrastructure services such as DNS and NTP. In this failure scenario these SNAT rule were missing and this resulted in Kubernetes vms not being able to reach the DNS server. This caused NSX-T manager host lookups to fail.
Configure T1 logical router for TKGI service vms with SNAT rules as described here.