Symptoms:
tkgi create-cluster one_worker --external-hostname one_worker --plan small -n 1 Name: one_worker Plan Name: small UUID: ########-57e0-####-99a5-############ Last Action: CREATE Last Action State: in progress Last Action Description: Creating cluster Kubernetes Master Host: one_worker Kubernetes Master Port: 8443 Worker Nodes: 1 Kubernetes Master IP(s): In Progress Use 'tkgi cluster one_worker' to monitor the state of your cluster
tkgi cluster one_worker Name: one_worker Plan Name: small UUID: 3c679b4f-57e0-4490-99a5-f9a9d97e3bc1 Last Action: CREATE Last Action State: failed Last Action Description: Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: 3c679b4f-57e0-4490-99a5-f9a9d97e3bc1, broker-request-id: 9cdc363a-ce90-4927-bcbe-030609e236da, task-id: 1667, operation: create Kubernetes Master Host: one_worker Kubernetes Master Port: 8443 Worker Nodes: 1 Kubernetes Master IP(s): In Progress
bosh task 1667 --debug
{"time":1531003250,"stage":"Fetching logs for apply-addons/6435229d-9d59-4bd9-8fe7-ddd7bc98a796 (0)","tags":[],"total":1,"task":"Finding and packing log files","index":1,"state":"finished","progress":100}
', "result_output" = '{"instance":{"group":"apply-addons","id":"6435229d-9d59-4bd9-8fe7-ddd7bc98a796"},"errand_name":"apply-addons","exit_code":1,"stdout":"Deploying /var/vcap/jobs/apply-specs/specs/kube-dns.yml\nservice \"kube-dns\" created\nserviceaccount \"kube-dns\" created\nconfigmap \"kube-dns-auth\" created\nconfigmap \"kube-dns\" created\ndeployment.extensions \"kube-dns\" created\nWaiting for rollout to finish: 0 of 1 updated replicas are available...\nfailed to start all system specs after 1200 with exit code 1\n","stderr":"error: deployment \"kube-dns\" exceeded its progress deadline\n","logs":{"blobstore_id":"########-dc63-####-76de-############","sha1":"########e2013131e1########24b490########"}}
/var/log/pods . In the nsx-node-proxy logs following message can be found:bosh ssh -d service-instance_########-57e0-####-99a5-############ worker
sudo su -
cd /var/log/pods/<random-container-id>/nsx-kube-proxy
{"log":"1 2018-07-07T22:45:30.617Z 1dcf46c6-44ca-46b5-8a35-4a86a4ba7ca9 NSX 8 - [nsx@6876 comp=\"nsx-container-node\" subcomp=\"nsx_kube_proxy\"
level=\"CRITICAL\"] nsx-container-node Unhandled error: IDNAError: The label one_worker is not a valid A-label\n","stream":"stderr","time":"2018-07-07T22:45:30.642372391Z"}
{"log":"2018-07-07T22:45:30.617 8 ERROR nsx-container-node Traceback (most recent call last):\n","stream":"stderr","time":"2018-07-07T22:45:30.642393499Z"}
While creating a cluster in TKGI, the tkgi cli currently does not validate whether the hostname conforms to RFC 1123. This allows the hostname with an underscore to pass cli initial validation but during the cluster creation nsx-kube-proxy tries to validate the hostname against RFC 1123. Due to failure in this validation nsx-kube-proxy pods are not able to start causing the cluster creation to fail.
Underscores '_' are not allowed as part of valid hostname. Remove the underscore from hostname and try to recreate the cluster.