LB status showing as degraded when using Distributed Load Balancer while AVI is also used in the environment
search cancel

LB status showing as degraded when using Distributed Load Balancer while AVI is also used in the environment

book

Article ID: 417411

calendar_today

Updated On:

Products

VMware NSX VMware Avi Load Balancer

Issue/Introduction

  • Using DLB for TKG cluster(s).
  • AVI is also in the environment, but not being used for the cluster.
  • Degraded alarm is showing, but the traffic is passing as expected.
  • One or more hosts show Logical Switch Ports(LSP) in  “Not ready” state.
    esxi> get load-balancer 39f9####-1e38-####-abec-5ae9####2241 status
    Load Balancer
    UUID : 39f9####-1e38-####-abec-5ae9####2241
    Display-Name : mydlb1
    Status : partially ready
    Ready LSP Count : 1
    Not Ready LSP Count: 2
    Conflict LSP Count : 0
    Ready LSP : 90cc####-7602-####-a27d-a45e####180d
    Not Ready LSP : 8c25####-64a8-####-b323-181f####1b45
                         01c0####-61be-####-9f20-5306####3531
    Conflict LSP :
    Warning : LSP below is not ready as DFW Exclusion List
                         8c25####-64a8-####-b323-181f####1b45
                         01c0####-61be-####-9f20-5306####3531

Environment

VMware NSX

VMware AVI Load Balancer

Cause

The issue occurs because of the way the DLB is applied within NSX when created via NCP. The DLB service scope is set to include all NCP related segments, including the AVI SE segment. As the DLB relies on the presence of the DFW on all logical switch ports for its functionality, a health check of the load balancer is to ensure every port within its scope is in a ready state with DFW enabled. As the AVI SE is automatically excluded within NSX this prevents the DFW being present on its LSP. This then conflicts with the health check for the DLB as the warning within the status says "LSP below is not ready as DFW Exclusion List" .  

This issue does not impact the datapath and is entirely cosmetic as long as all of the LSP mentioned in the error are owned by AVI SEs.

Resolution

Currently, there is no fix for this issue.  You may work around the issue with the following procedure.

  1. Create a security group to exclude the AVI service engine VMs.
    1. Navigate to the Inventory->Groups section of the NSX UI.
    2. Click the Add Group button.
    3. Add the group criteria.
      1. NSX Segment Tag Equals <cluster tag name>
        • The segment tag is the cluster level tag applied to the AVI SE segment holding the problematic LSP. This will be in the format of the following example  <domain-c10:ab1c2345-def6-7890-123g-4h5ij6k7l8m>.  
        • To find this you can take one of the LSP IDs listed in the error, search in the NSX GUI by clicking the magnifying glass and find which the segment it belongs to, it should be an AVI segment. Then check the segment tags applied to this segment for the domain-X tag with a scope of ncp/cluster. 
      2. Scope Equals ncp/cluster.  If ncp/cluster does not show up in the selection list, you may enter it in manually.
      3. click the + button at the end of the first line to add another criteria.  Ensure that the operator is selected as AND.
      4. NSX Segment Tag Not Equals AVI.  If AVI does not appear in the selection list, you may enter it manually.
      5. Scope Equals ncp/created_for
  2. Verify that the AVI service engine VM's do NOT show up in this group.
    1. Navigate to the Inventory->Groups section of the NSX UI.
    2. Find your new group.
    3. Select the View Members link.
    4. Verify that there aren't any AVI service engine VM's listed.
  3. Get the path to the new security group.
    1. Navigate to the Inventory->Groups section of the NSX UI.
    2. Find your new group.
    3. Select the ellipsis (three dots) on the left side of the group and select the copy path to clipboard option.
    4. Save this path in a text file.
  4. Get the Load Balancer domain ID.  This may be done from an NSX Manager node via SSH as root or from any command line the support the CURL command.
    1. curl -k -u 'admin' 'https://<NSX manager IP>/policy/api/v1/infra/lb-services/'
    2. Find the 'id' section of the output that also contains the 'display name' that matches your load balancer.
    3. Save this id in a text file.
  5. Output the distributed load balancer service JSON data to a file.  This may be done from an NSX Manager node via SSH as root or from any command line that supports the CURL command. 
    1. curl -k -u 'admin' 'https://<NSX manager IP>/policy/api/v1/infra/lb-services/<load balancer domain ID> > data.json
      1. The load balancer domain ID is the id from step 4.
  6. Edit the data.json file and change the connectivity_path section to use the security group path from Step 3.
  7. Update the dlb domain using the edited data.json file.
    1. curl -k -u 'admin' https://<NSX manager IP>/policy/api/v1/infra/lb-services/<cluster_ip_domain> -X PATCH [email protected] -H 'accept: application/json' -H 'Content-type: application/json' -H 'X-Allow-Overwrite: true
  8. After 5 minutes, verify that the load balancer no longer shows the alarm.
    1. curl -k -u 'admin' 'https://<NSX manager IP>/policy/api/v1/infra/lb-services/<cluster_ip_domain>/detailed-status?source=realtime&enforcement_point_path=/infra/sites/default/enforcement-points/default'
    2. The NOT_READY instance number should be 0 for all hosts.