TKGI cluster creation in NSX-T environment fails if an underscore is used in --external-hostname
search cancel

TKGI cluster creation in NSX-T environment fails if an underscore is used in --external-hostname

book

Article ID: 298534

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Symptoms:

  • Cluster creation is attempted with underscore in hostname:
    tkgi create-cluster one_worker --external-hostname one_worker --plan small -n 1    
    
    Name:                     one_worker
    Plan Name:                small
    UUID:                     ########-57e0-####-99a5-############
    Last Action:              CREATE
    Last Action State:        in progress
    Last Action Description:  Creating cluster
    Kubernetes Master Host:   one_worker
    Kubernetes Master Port:   8443
    Worker Nodes:             1
    Kubernetes Master IP(s):  In Progress
    
    Use 'tkgi cluster one_worker' to monitor the state of your cluster
  • Cluster creation fails with below error:
    tkgi cluster one_worker
    
    Name:                     one_worker
    Plan Name:                small
    UUID:                     3c679b4f-57e0-4490-99a5-f9a9d97e3bc1
    Last Action:              CREATE
    Last Action State:        failed
    Last Action Description:  Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: 3c679b4f-57e0-4490-99a5-f9a9d97e3bc1, broker-request-id: 9cdc363a-ce90-4927-bcbe-030609e236da, task-id: 1667, operation: create
    Kubernetes Master Host:   one_worker
    Kubernetes Master Port:   8443
    Worker Nodes:             1
    Kubernetes Master IP(s):  In Progress
  • Bosh task fails with - failed to start all system specs after 1200 with exit code​:
    bosh task 1667 --debug
    
    {"time":1531003250,"stage":"Fetching logs for apply-addons/6435229d-9d59-4bd9-8fe7-ddd7bc98a796 (0)","tags":[],"total":1,"task":"Finding and packing log files","index":1,"state":"finished","progress":100}
    ', "result_output" = '{"instance":{"group":"apply-addons","id":"6435229d-9d59-4bd9-8fe7-ddd7bc98a796"},"errand_name":"apply-addons","exit_code":1,"stdout":"Deploying /var/vcap/jobs/apply-specs/specs/kube-dns.yml\nservice \"kube-dns\" created\nserviceaccount \"kube-dns\" created\nconfigmap \"kube-dns-auth\" created\nconfigmap \"kube-dns\" created\ndeployment.extensions \"kube-dns\" created\nWaiting for rollout to finish: 0 of 1 updated replicas are available...\nfailed to start all system specs after 1200 with exit code 1\n","stderr":"error: deployment \"kube-dns\" exceeded its progress deadline\n","logs":{"blobstore_id":"########-dc63-####-76de-############","sha1":"########e2013131e1########24b490########"}}
  • The above error is generic and can occur due to many reasons. To trace the down the error ssh into any one of the Kubernetes worker vms and look for nsx-kube-proxy logs under /var/log/pods . In the nsx-node-proxy logs following message can be found:
    bosh ssh -d service-instance_########-57e0-####-99a5-############ worker
    sudo su -
    cd /var/log/pods/<random-container-id>/nsx-kube-proxy
    
    
    {"log":"1 2018-07-07T22:45:30.617Z 1dcf46c6-44ca-46b5-8a35-4a86a4ba7ca9 NSX 8 - [nsx@6876 comp=\"nsx-container-node\" subcomp=\"nsx_kube_proxy\"
     level=\"CRITICAL\"] nsx-container-node Unhandled error: IDNAError: The label one_worker is not a valid A-label\n","stream":"stderr","time":"2018-07-07T22:45:30.642372391Z"}
    {"log":"2018-07-07T22:45:30.617 8 ERROR nsx-container-node Traceback (most recent call last):\n","stream":"stderr","time":"2018-07-07T22:45:30.642393499Z"}

     

Environment


Cause

While creating a cluster in TKGI, the tkgi cli currently does not validate whether the hostname conforms to RFC 1123. This allows the hostname with an underscore to pass cli initial validation but during the cluster creation nsx-kube-proxy tries to validate the hostname against RFC 1123Due to failure in this validation nsx-kube-proxy pods are not able to start causing the cluster creation to fail.

Resolution

Underscores '_' are not allowed as part of valid hostname. Remove the underscore from hostname and try to recreate the cluster.