vSphere Supervisor with NSX-T NCP Load Balancer Service Created but External-IP <pending> due to Load Balancer Member Limit Reached - LBS exceeded limit
search cancel

vSphere Supervisor with NSX-T NCP Load Balancer Service Created but External-IP <pending> due to Load Balancer Member Limit Reached - LBS exceeded limit

book

Article ID: 394546

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service Tanzu Kubernetes Runtime VMware vSphere 7.0 with Tanzu VMware NSX for vSphere vSphere with Tanzu

Issue/Introduction

In a vSphere Supervisor cluster environment that uses NSX-T and the NSX-NCP pod for load balancing, a new service of load balancer type shows the following symptoms:

  • The new load balancer service does not receive an external IP address, showing as <pending>:
    • kubectl get svc -n <namespace> | grep pending

      NAMESPACE NAME TYPE INTERNAL IP EXTERNAL IP
      <namespace> <loadbalancer service name> LoadBalancer <internal IP> <pending>
  • When describing the new load balancer service with <pending>, the error messages similar to the following are present, where the values encased in brackets <> will vary by environment:
    • In the below error, the <LB limit> is determined by the size of the load balancer associated with the namespace.
    • kubectl describe svc -n <namespace> <loadbalancer service name>

      nsx-container-ncp  LB Service <load balancer> limit exceeded: Unable to attach new resource <new member> to lbs <load balancer>: LBS exceeded limit of <LB limit>.

    • This service will also have the following annotation:
      • Annotations:     ncp/error.loadbalancer: LBS_LIMIT_EXCEEDED

While connected to the Supervisor cluster context, the NSX-NCP pod log may show errors similar to the following:

  • kubectl get pods -n nsx-ncp
  • kubectl logs -n nsx-ncp <nsx ncp pod>

    The maximum size of pool members for <load balancer SIZE> load balancer service form factor is <load balancer size limit>, current size of pool members is <greater than or equal to the load balancer size limit>"

Environment

vSphere with Tanzu 7.0

vSphere with Tanzu 8.0

NSX 4.X

NCP (NSX-T Container Plugin) 4.X

Cause

The noted load balancer has reached its limit of members or the new load balancer will increase the member count beyond the load balancer's limit.

 

  • Load Balancer autoscaling works only against number of virtual servers, not the pool member count.

  • This is expected behavior by design as relocating the services can require downtime.

  • The default load balancer size is SMALL, which has a limit of 300 pool members by default in NSX-T. However, this is overridden by NSX-NCP in vSphere Supervisor using NSX-T.

  • NSX-NCP overrides the default load balancer pool member limit:
    • Small Pool Member Limit: 2,000
    • Medium Pool Member Limit: 2,000
    • Large Pool Member Limit: 6,000

 

In a NSX-T load-balancer, pool-members are created to distribute the traffic between them. Every pool member is an object containing a unique pool-member-IP+Port combination.

Resolution

vSphere Supervisor currently does not support changing the size of an existing load balancer. The only solutions are the following:

  • Modify the existing workloads for the namespace and/or cluster to reduce the members for the affected load balancer.

  • Create a new workload cluster with a larger load balancer size and migrate workloads to that new cluster.
    • NSX-NCP overrides the default load balancer pool member limit in vSphere Supervisor using NSX-T.
      • Small Pool Member Limit: 2,000
      • Medium Pool Member Limit: 2,000
      • Large Pool Member Limit: 6,000

  • Create multiple namespaces with workload clusters to disperse the migrated workloads across multiple load balancers.


WARNING: 
It is not supported to edit the nsx-ncp-config configmap configuration in a vSphere Supervisor environment.

Changes made to the nsx-ncp-config configmap will be reverted on Supervisor control plane node recreation, such as a Supervisor cluster upgrade.

 

Load Balancer Member Count Check

The following steps can be run from a machine that can reach the NSX manager to confirm on the status of the affected load balancer:

  1. Retrieve the affected NSX load balancer's ID, replacing  <load balancer> with the load balancer from the error message:

    • curl -k -u admin:'<password>' "https://<NSX_MANAGER>/api/v1/loadbalancer/services/" | grep -A4 "<load balancer>"
    • Which will return an output similar to the below:
    • "resource_type" :"LbService",
      "id": "<load balancer ID>"
      "display_name" : "<load balancer>"
  2. Perform the following api command to retrieve usage information on the affected load balancer using the <load balancer ID> from the previous step:
    • curl -k -u admin:'<PASSWORD>' "https://<NSX_MANAGER>/api/v1/loadbalancer/services/<load balancer ID>/usage"
    • Which should output similar to the following where the values will vary by environment, but the load balancer shows as severity RED due to high or 100% usage:
    • {
        "service_id" : "<load balancer id>",
        "service_size" : "<load balancer SIZE>",
        "virtual_server_capacity" : ##,
        "pool_capacity" : ##,
        "pool_member_capacity" : <load balancer size limit>,
        "current_virtual_server_count : ##,
        "current_pool_count : ###,
        "current_pool_member_count : ###,
        "usage_percentage" : ##.#,
        "severity" : "RED"
      }
    • Note: NSX-NCP overrides the default load balancer pool member limit in vSphere Supervisor.
      • This override is not accurately reflected in the above NSX API output which will only show the NSX-T defaults.

  3. With the load balancer ID from Step 1, the below api command can be used to retrieve a list of pool members:
    • curl -k -u admin:'<password>' "https://<NSX_MANAGER>/api/v1/loadbalancer/services/<load balancer ID>/status?source=realtime"