Cluster expansion in SDDC fails at Validate NSX-T Transport Node Cluster does not use Static IP Pools
search cancel

Cluster expansion in SDDC fails at Validate NSX-T Transport Node Cluster does not use Static IP Pools

book

Article ID: 395818

calendar_today

Updated On:

Products

VMware SDDC Manager VMware Cloud Foundation Subscription

Issue/Introduction

  • Error in SDDC UI
    Description	Validate NSX-T Transport Node Cluster does not use Static IP Pools
    Progress Messages	Expanding L3 based Cluster is not supported since the cluster is using NSX-T overlay static IP pool.
    Error
    
    Message: Expanding L3 based Cluster is not supported since the cluster is using NSX-T overlay static IP pool.
    Remediation Message:
    Reference Token: #####
    Cause: Host esxi01.example.com in cluster has static IP pool [6d86####-####-####-####-########b801] defined. Cannot continue with workflow.

     

  • Error in /var/log/vmware/vcf/domainmanager/domainmanager.log
    ERROR [vcf_dm,##########,408e] [c.v.e.s.o.model.error.ErrorFactory,dm-exec-18]  [#####] NSXT_VALIDATE_L3_CLUSTER_WITH_STATIC_IP_POOL_FAILED Expanding L3 based Cluster is not supported since the cluster is using NSX-T overlay static IP pool.
    com.vmware.evo.sddc.orchestrator.exceptions.OrchTaskException: Expanding L3 based Cluster is not supported since the cluster is using NSX-T overlay static IP pool.
            at com.vmware.vcf.common.fsm.plugins.nsxt.action.ValidateNsxtOverlayIpAssignmentBaseAction.execute(ValidateNsxtOverlayIpAssignmentBaseAction.java:145)
            at com.vmware.vcf.common.fsm.plugins.nsxt.action.ValidateNsxtOverlayIpAssignmentAction.execute(ValidateNsxtOverlayIpAssignmentAction.java:29)
            at com.vmware.vcf.common.fsm.plugins.nsxt.action.ValidateNsxtOverlayIpAssignmentAction.execute(ValidateNsxtOverlayIpAssignmentAction.java:12)
            at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionState.invoke(FsmActionState.java:62)
            at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:159)
            at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:144)
            at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.invokeMethod(ProcessingTaskSubscriber.java:400)
            at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.processTask(ProcessingTaskSubscriber.java:520)
            at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.accept(ProcessingTaskSubscriber.java:124)
            at jdk.internal.reflect.GeneratedMethodAccessor469.invoke(Unknown Source)
            at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.base/java.lang.reflect.Method.invoke(Method.java:566)
            at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:88)
            at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:73)
            at org.springframework.cloud.sleuth.instrument.async.TraceRunnable.run(TraceRunnable.java:64)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
            at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: java.lang.RuntimeException: Host esxi01.example.com in cluster has static IP pool [6d86####-####-####-####-########b801] defined. Cannot continue with workflow.

     

  • Additional logging in /var/log/vmware/vcf/domainmanager/domainmanager.log
    WARN  [vcf_dm,##########,6206] [c.v.v.h.HostManagerEventHandler,dm-exec-3]  Could not collect persisted hosts in cluster.
    java.lang.NullPointerException: null
            at com.vmware.vcf.hostmanager.service.model.AddHostInternalModel.getClusterId(AddHostInternalModel.java:120)
    INFO  [vcf_dm,##########,265b] [c.v.e.s.c.s.a.w.o.WorkflowOptionsAdapterUtil,http-nio-127.0.0.1-7200-exec-10]  Checking if network pool is same for cluster 201c####-####-####-####-########5807 after adding hosts [a245####-####-####-####-########b99b, e754####-####-####-####-########c45f].
    INFO  [vcf_dm,##########,265b] [c.v.e.s.c.s.a.w.o.WorkflowOptionsAdapterUtil,http-nio-127.0.0.1-7200-exec-10]  Is network pool match: false 
    INFO  [vcf_dm,##########,5433] [c.v.v.c.f.p.n.a.ValidateNsxtOverlayIpAssignmentBaseAction,dm-exec-9]  Found Transport Node
    
    {
       .........
                   "ip_assignment_spec": {
                      "fields": {
                        "ip_pool_id": {
                          "value": "6d86####-####-####-####-########b801"
                        },
                        "resource_type": {
                          "value": "StaticIpPoolSpec" <=== static ip pool
                        }
                      },
                      "name": "struct"
                    },
       ..........
    }

Environment

VMware Cloud Foundation

Cause

  • For VCF 4.5.2, all host(s) of the cluster belong to one L2 domain.
  • Existing hosts in the Cluster and new hosts to be added to the cluster are in different network pool in SDDC Manager.
  • Since the new hosts are in a different network pool, SDDC is treating them as belonging to a different L2 domain, which is leading to the failure.

Resolution

Validate that the ESXi hosts are in different network pool in SDDC Manager

  • Follow the below steps
  • SSH to SDDC Manager with vcf user and su to root
    1. Get the host ids from SDDC platform database
      psql -h localhost -U postgres -d platform -c "select id,hostname from where hostname='esxi01.example.com'" ----> for existing host in cluster

      Sample output

                        id                  |       hostname
      --------------------------------------+----------------------
       6d86####-####-####-####-########b801 | esxi01.example.com
      (1 row)
      psql -h localhost -U postgres -d platform -c "select id,hostname from where hostname='esxi05.example.com'" ----> for new host to be added in cluster

      Sample output

                        id                  |       hostname
      --------------------------------------+----------------------
       a245####-####-####-####-########b99b | esxi05.example.com
      (1 row)
    2. Get the associated network pool id for the hosts
      psql -h localhost -U postgres -d platform -c "select * from host_and_network_pool where host_id='6d86####-####-####-####-########b801'" ---> for existing host in cluster

      Sample output

      id |               host_id                |           network_pool_id
      ----+--------------------------------------+--------------------------------------
        2 | 6d86####-####-####-####-########b801 | b146###-####-####-####-########6d4c
      (1 row)

       

      psql -h localhost -U postgres -d platform -c "select * from host_and_network_pool where host_id='a245####-####-####-####-########b99b'" ---> for new host to be added in cluster

      Sample output

      id |               host_id                |           network_pool_id
      ----+--------------------------------------+--------------------------------------
       13 | a245####-####-####-####-########b99b | 17b7###-####-####-####-########e659
      (1 row)

      Note: Do not modify host_and_network_pool table to match the IDs. Updating the SDDCM inventory with the n/p ID matching can potentially fail the workflow at a later stage while creating VMKs for vMotion and vSAN etc. This could lead to potential network connectivity issues. 

To resolve the issue upgrade VCF to 5.1 or later

 

Workaround:

Decommission and Recommission the new hosts in the network pool of existing hosts in cluster.

Refer:

Decommission Hosts 

Commission Hosts