PKS cluster creation in NSX-T fails with conflicting CIDR in IP pools under NSX-T
search cancel

PKS cluster creation in NSX-T fails with conflicting CIDR in IP pools under NSX-T

book

Article ID: 298499

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Symptoms:
  • The cluster creation attempted using PKS CLI as below
    pks create-cluster one_worker --external-hostname oneworker --plan small -n 1    
    
    Name:                     one_worker
    Plan Name:                small
    UUID:                     3c679b4f-####-4490-99a5-f9a9d97e3bc1
    Last Action:              CREATE
    Last Action State:        in progress
    Last Action Description:  Creating cluster
    Kubernetes Master Host:   oneworker
    Kubernetes Master Port:   8443
    Worker Nodes:             1
    Kubernetes Master IP(s):  In Progress
    
    Use 'pks cluster one_worker' to monitor the state of your cluster
  • Cluster creation fails with below error
    pks cluster one_worker
    
    Name:                     one_worker
    Plan Name:                small
    UUID:                     3c679b4f-####-####-99a5-f9a9d97e3bc1
    Last Action:              CREATE
    Last Action State:        failed
    Last Action Description:  Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: 3c679b4f-####-4490-99a5-f9a9d97e3bc1, broker-request-id: 9cdc363a-ce90-4927-bcbe-030609e236da, task-id: 1667, operation: create
    Kubernetes Master Host:   one_worker
    Kubernetes Master Port:   8443
    Worker Nodes:             1
    Kubernetes Master IP(s):  In Progress
  • Bosh task fails with "failed to start all system specs after 1200 with exit code​"
    bosh task 1667 --debug
    
    {"time":1531003250,"stage":"Fetching logs for apply-addons/6435229d-####-####-8fe7-ddd7bc98a796 (0)","tags":[],"total":1,"task":"Finding and packing log files","index":1,"state":"finished","progress":100}
    ', "result_output" = '{"instance":{"group":"apply-addons","id":"6435229d-####-####-8fe7-ddd7bc98a796"},"errand_name":"apply-addons","exit_code":1,"stdout":"Deploying /var/vcap/jobs/apply-specs/specs/kube-dns.yml\nservice \"kube-dns\" created\nserviceaccount \"kube-dns\" created\nconfigmap \"kube-dns-auth\" created\nconfigmap \"kube-dns\" created\ndeployment.extensions \"kube-dns\" created\nWaiting for rollout to finish: 0 of 1 updated replicas are available...\nfailed to start all system specs after 1200 with exit code 1\n","stderr":"error: deployment \"kube-dns\" exceeded its progress deadline\n","logs":{"blobstore_id":"e95fcfb4-####-####-76de-06cddba6a148","sha1":"d994360ee2013131e14ba4507e24b490dd141bf5"}}
  • The above error is generic and can occur due to many reasons. To trace the down the error ssh into any one of the Kubernetes worker vms and look for nsx-ncp logs under /var/log/pods. In the nsx-ncp logs following message can be found
    bosh ssh -d service-instance_3c679b4f-57e0-4490-99a5-f9a9d97e3bc1 worker
    sudo su -
    cd /var/log/pods/<random-container-id>/nsx-ncp
    
    1 2018-07-05T23:28:08.579Z c471cff7-####-####-####-73f3759fcd03 NSX 9 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="ERROR" errorCode="NCP00007"] nsx_ujo.common.utils NSX configuration error: [u'IP space 14413ce9-####-####-####-a49046011f08 overlaps with IP space 7cb6e1ed-c19e-####-####-20c1fa0d1575']
    1 2018-07-05T23:28:08.580Z c471cff7-####-####-903b-73f3759fcd03 NSX 9 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="CRITICAL" security="True" errorCode="NCP00001"] nsx_ujo.k8s.adaptor NCP configuration validation failed

Environment


Cause

This happens due to the the fact that two IP spaces overlap as seen from the message above - [u'IP space 14413ce9-86b5-4346-9e63-a49046011f08 overlaps with IP space 7cb6e1ed-c19e-4a45-a77a-20c1fa0d1575'] . The IP space UUIDs can be used to identify the objects under IP Pools, when you go to Inventory from NSX-T Manager. Under these objects the conflict is caused due to same CIDR ranges and tags used by NSX-T container plugin (NCP).

Resolution

NCP expects that there are no conflicting CIDR for objects defined under IP Pools, when you go to Inventory from NSX-T Manager using the same NSX-T tags. For a successful cluster creation this conflict must be resolved.