PKS cluster creation in NSX-T fails as all available IPs in external SNAT IP pools are exhausted
book
Article ID: 298582
calendar_today
Updated On:
Products
VMware Tanzu Kubernetes Grid Integrated Edition
Issue/Introduction
Symptoms:
Cluster creation attempted using PKS CLI
pks create-cluster one_worker --external-hostname oneworker --plan small -n 1
Name: one_worker
Plan Name: small
UUID: 3c679b4f-57e0-4490-99a5-f9a9d97e3bc1
Last Action: CREATE
Last Action State: in progress
Last Action Description: Creating cluster
Kubernetes Master Host: oneworker
Kubernetes Master Port: 8443
Worker Nodes: 1
Kubernetes Master IP(s): In Progress
Use 'pks cluster one_worker' to monitor the state of your cluster
Cluster creation fails with below error
pks cluster one_worker
Name: one_worker
Plan Name: small
UUID: 3c679b4f-57e0-4490-99a5-f9a9d97e3bc1
Last Action: CREATE
Last Action State: failed
Last Action Description: Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: 3c679b4f-57e0-4490-99a5-f9a9d97e3bc1, broker-request-id: 9cdc363a-ce90-4927-bcbe-030609e236da, task-id: 1667, operation: create
Kubernetes Master Host: one_worker
Kubernetes Master Port: 8443
Worker Nodes: 1
Kubernetes Master IP(s): In Progress
Bosh task fails with - failed to start all system specs after 1200 with exit code
bosh task 1667 --debug
{"time":1531003250,"stage":"Fetching logs for apply-addons/6435229d-9d59-4bd9-8fe7-ddd7bc98a796 (0)","tags":[],"total":1,"task":"Finding and packing log files","index":1,"state":"finished","progress":100}
', "result_output" = '{"instance":{"group":"apply-addons","id":"6435229d-9d59-4bd9-8fe7-ddd7bc98a796"},"errand_name":"apply-addons","exit_code":1,"stdout":"Deploying /var/vcap/jobs/apply-specs/specs/kube-dns.yml\nservice \"kube-dns\" created\nserviceaccount \"kube-dns\" created\nconfigmap \"kube-dns-auth\" created\nconfigmap \"kube-dns\" created\ndeployment.extensions \"kube-dns\" created\nWaiting for rollout to finish: 0 of 1 updated replicas are available...\nfailed to start all system specs after 1200 with exit code 1\n","stderr":"error: deployment \"kube-dns\" exceeded its progress deadline\n","logs":{"blobstore_id":"e95fcfb4-dc63-4d8a-76de-06cddba6a148","sha1":"d994360ee2013131e14ba4507e24b490dd141bf5"}}
To trace the down the error ssh into any one of the Kubernetes worker vms and look for nsx-ncp logs under /var/log/pods . In the nsx-ncp logs following message can be found
{"log":"1 2018-07-05T23:40:27.891Z db2a2dbb-1880-46c1-ae32-31407ea2d45e NSX 8 - [nsx@6876 comp=\"nsx-container-ncp\" subcomp=\"ncp\" level=\"WARNING\"] vmware_nsxlib.v3.client The HTTP request returned error code 409, whereas 201/200 response codes were expected.
Response body {u'error_code': 5109, u'error_message': u'Insufficient free IP addresses to allocate from the pool.', u'httpStatus': u'CONFLICT', u'module_name': u'id-allocation service'}\n","stream":"stderr","time":"2018-07-05T23:40:27.896823025Z"}
{"log":"1 2018-07-05T23:40:27.892Z db2a2dbb-1880-46c1-ae32-31407ea2d45e NSX 8 -
[nsx@6876 comp=\"nsx-container-ncp\" subcomp=\"ncp\" level=\"WARNING\"] nsx_ujo.ncp.nsxapi Unable to allocate IP address from pool 14413ce9-86b5-4346-9e63-a49046011f08: Unexpected error from backend manager (['nsxmgr-01.haas-134.pez.pivotal.io']) for POST api/v1/pools/ip-pools/14413ce9-86b5-4346-9e63-a49046011f08?action=ALLOCATE : Insufficient free IP addresses to allocate from the pool.\n","stream":"stderr","time":"2018-07-05T23:40:27.896850209Z"}
{"log":"1 2018-07-05T23:40:27.892Z db2a2dbb-1880-46c1-ae32-31407ea2d45e NSX 8 - [nsx@6876 comp=\"nsx-container-ncp\" subcomp=\"ncp\" level=\"ERROR\" errorCode=\"NCP00015\"] nsx_ujo.ncp.nsxapi IP pool 14413ce9-86b5-4346-9e63-a49046011f08 is exhausted to allocate IP address\n","stream":"stderr","time":"2018-07-05T23:40:27.896855635Z"}
{"log":"1 2018-07-05T23:40:27.900Z db2a2dbb-1880-46c1-ae32-31407ea2d45e NSX 8 - [nsx@6876 comp=\"nsx-container-ncp\" subcomp=\"ncp\" level=\"INFO\"] nsx_ujo.ncp.nsxapi Reset external ip pools for cluster pks-6d64addd-2217-46a8-8255-edf4747c8ae3\n","stream":"stderr","time":"2018-07-05T23:40:27.906869019Z"}
{"log":"1 2018-07-05T23:40:27.901Z db2a2dbb-1880-46c1-ae32-31407ea2d45e NSX 8 - [nsx@6876 comp=\"nsx-container-ncp\" subcomp=\"ncp\" level=\"INFO\"] nsx_ujo.common.utils Failed to execute function _allocate_external_ip: Unexpected error from backend manager (['nsxmgr-01.haas-134.pez.pivotal.io']) for External IP allocation : Unable to allocate external IP, will retry after 1 seconds\n","stream":"stderr","time":"2018-07-05T23:40:27.906889162Z"}
Environment
Cause
PKS uses the floating IP pool to allocate IP addresses to the load balancers created for each of the clusters. The load balancer routes the API requests to the master nodes and the data plane. This IP pool is declared under Floating IP Pool ID under Networking in PKS Tile, and created under IP Pools under Inventory under NSX-T Manger. NSX-T container plugin (NCP) tries to allocate IP addresses from the IP pool during cluster creation. If there are insufficient IP addresses present under this pool cluster creation will fail. The pool can be identified from the UUID printed in the error message above.
Resolution
Increase the size of IP pool for cluster creation to complete.