PKS cluster creation in NSX-T fails as all available IPs in external SNAT IP pools are exhausted
search cancel

PKS cluster creation in NSX-T fails as all available IPs in external SNAT IP pools are exhausted

book

Article ID: 298582

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Symptoms:
  • Cluster creation attempted using PKS CLI
    pks create-cluster one_worker --external-hostname oneworker --plan small -n 1    
    
    Name:                     one_worker
    Plan Name:                small
    UUID:                     3c679b4f-57e0-4490-99a5-f9a9d97e3bc1
    Last Action:              CREATE
    Last Action State:        in progress
    Last Action Description:  Creating cluster
    Kubernetes Master Host:   oneworker
    Kubernetes Master Port:   8443
    Worker Nodes:             1
    Kubernetes Master IP(s):  In Progress
    
    Use 'pks cluster one_worker' to monitor the state of your cluster
  • Cluster creation fails with below error
    pks cluster one_worker
    
    Name:                     one_worker
    Plan Name:                small
    UUID:                     3c679b4f-57e0-4490-99a5-f9a9d97e3bc1
    Last Action:              CREATE
    Last Action State:        failed
    Last Action Description:  Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: 3c679b4f-57e0-4490-99a5-f9a9d97e3bc1, broker-request-id: 9cdc363a-ce90-4927-bcbe-030609e236da, task-id: 1667, operation: create
    Kubernetes Master Host:   one_worker
    Kubernetes Master Port:   8443
    Worker Nodes:             1
    Kubernetes Master IP(s):  In Progress
  • Bosh task fails with - failed to start all system specs after 1200 with exit code
    bosh task 1667 --debug
    
    {"time":1531003250,"stage":"Fetching logs for apply-addons/6435229d-9d59-4bd9-8fe7-ddd7bc98a796 (0)","tags":[],"total":1,"task":"Finding and packing log files","index":1,"state":"finished","progress":100}
    ', "result_output" = '{"instance":{"group":"apply-addons","id":"6435229d-9d59-4bd9-8fe7-ddd7bc98a796"},"errand_name":"apply-addons","exit_code":1,"stdout":"Deploying /var/vcap/jobs/apply-specs/specs/kube-dns.yml\nservice \"kube-dns\" created\nserviceaccount \"kube-dns\" created\nconfigmap \"kube-dns-auth\" created\nconfigmap \"kube-dns\" created\ndeployment.extensions \"kube-dns\" created\nWaiting for rollout to finish: 0 of 1 updated replicas are available...\nfailed to start all system specs after 1200 with exit code 1\n","stderr":"error: deployment \"kube-dns\" exceeded its progress deadline\n","logs":{"blobstore_id":"e95fcfb4-dc63-4d8a-76de-06cddba6a148","sha1":"d994360ee2013131e14ba4507e24b490dd141bf5"}}
  • To trace the down the error ssh into any one of the Kubernetes worker vms and look for nsx-ncp logs under /var/log/pods . In the nsx-ncp logs following message can be found
    {"log":"1 2018-07-05T23:40:27.891Z db2a2dbb-1880-46c1-ae32-31407ea2d45e NSX 8 - [nsx@6876 comp=\"nsx-container-ncp\" subcomp=\"ncp\" level=\"WARNING\"] vmware_nsxlib.v3.client The HTTP request returned error code 409, whereas 201/200 response codes were expected. 
    Response body {u'error_code': 5109, u'error_message': u'Insufficient free IP addresses to allocate from the pool.', u'httpStatus': u'CONFLICT', u'module_name': u'id-allocation service'}\n","stream":"stderr","time":"2018-07-05T23:40:27.896823025Z"}
    {"log":"1 2018-07-05T23:40:27.892Z db2a2dbb-1880-46c1-ae32-31407ea2d45e NSX 8 - 
    [nsx@6876 comp=\"nsx-container-ncp\" subcomp=\"ncp\" level=\"WARNING\"] nsx_ujo.ncp.nsxapi Unable to allocate IP address from pool 14413ce9-86b5-4346-9e63-a49046011f08: Unexpected error from backend manager (['nsxmgr-01.haas-134.pez.pivotal.io']) for POST api/v1/pools/ip-pools/14413ce9-86b5-4346-9e63-a49046011f08?action=ALLOCATE : Insufficient free IP addresses to allocate from the pool.\n","stream":"stderr","time":"2018-07-05T23:40:27.896850209Z"}
    {"log":"1 2018-07-05T23:40:27.892Z db2a2dbb-1880-46c1-ae32-31407ea2d45e NSX 8 - [nsx@6876 comp=\"nsx-container-ncp\" subcomp=\"ncp\" level=\"ERROR\" errorCode=\"NCP00015\"] nsx_ujo.ncp.nsxapi IP pool 14413ce9-86b5-4346-9e63-a49046011f08 is exhausted to allocate IP address\n","stream":"stderr","time":"2018-07-05T23:40:27.896855635Z"}
    {"log":"1 2018-07-05T23:40:27.900Z db2a2dbb-1880-46c1-ae32-31407ea2d45e NSX 8 - [nsx@6876 comp=\"nsx-container-ncp\" subcomp=\"ncp\" level=\"INFO\"] nsx_ujo.ncp.nsxapi Reset external ip pools for cluster pks-6d64addd-2217-46a8-8255-edf4747c8ae3\n","stream":"stderr","time":"2018-07-05T23:40:27.906869019Z"}
    {"log":"1 2018-07-05T23:40:27.901Z db2a2dbb-1880-46c1-ae32-31407ea2d45e NSX 8 - [nsx@6876 comp=\"nsx-container-ncp\" subcomp=\"ncp\" level=\"INFO\"] nsx_ujo.common.utils Failed to execute function _allocate_external_ip: Unexpected error from backend manager (['nsxmgr-01.haas-134.pez.pivotal.io']) for External IP allocation : Unable to allocate external IP, will retry after 1 seconds\n","stream":"stderr","time":"2018-07-05T23:40:27.906889162Z"}


Environment


Cause

PKS uses the floating IP pool to allocate IP addresses to the load balancers created for each of the clusters. The load balancer routes the API requests to the master nodes and the data plane. This IP pool is declared under  Floating IP Pool ID under Networking in PKS Tile, and created under IP Pools under Inventory under NSX-T Manger. NSX-T container plugin (NCP) tries to allocate IP addresses from the IP pool during cluster creation. If there are insufficient IP addresses present under this pool cluster creation will fail. The pool can be identified from the UUID printed in the error message above.

Resolution

Increase the size of IP pool for cluster creation to complete.