Troubleshooting NSX Native Load Balancer
search cancel

Troubleshooting NSX Native Load Balancer

book

Article ID: 376344

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

This article contains the basic information for troubleshooting the NSX Native Load Balancer and the data required when opening a support request with Broadcom.   

Environment

VMware NSX

Cause

Common Issues:

Check LB logs for 502 Bad Gateway issue

1.  Enable DEBUG logging on the Load Balancer page and Access Logs on the Virtual Server page in the NSX-T UI.                                                                                                                                                                                                                  

2.  SSH into the Edge (Where the LB is active) and move into: /var/log/lb/<LB ID>/logs

      grep -a "502" error.log.11.gz

      error.log.11.gz:2019/09/25 11:54:16 [debug] 32409#0: *11258672 HTTP/1.1 502 Bad Gateway

      error.log.11.gz:2019/09/25 11:54:24 [debug] 32406#0: *11292889 HTTP/1.1 502 Bad Gateway

Note: 502 bad gateway can be due to header issues, response or request.

In the /var/log/lb/<LoadBalancer-UUID>/logs/error.log, specifically look for the 502 error, then it should clearly indicate the header is too big.

Note:  Usually the header too big issue is due to a large cookie added.

If this is the case, the headers can be changed in the GUI.

Manager interface only supports request header size change, while Policy Interface supports both response and request.

If only a request is required, a new policy profile can be created and attached to the manager load balancer.

Make sure to revert debug logging when done.

NOTE:   Debug logs are deleted the moment DEBUG is turned off.  So always gather logs from the edge BEFORE disabling debug logging.

 

Check NGINX.conf file generation issues.

1.  check the lbconf_gen.log

/var/log/lb/<loadbalancer-UUID>/lbconf_gen.log
2020-08-03 09:01:35,301 204 lb ERROR failed to build nginx config
2020-08-03 09:01:35,301 204 lb ERROR 'ascii' codec can't encode character '\u2013' in position 6950: ordinal not in range(128)

2. Error in /var/log/syslog

<5>1 2020-08-03T09:26:27.355660+00:00 w1-dmz-edge04 kernel - - - [ 3789.995524] grsec: [e01e2f34####] denied RWX mmap of <anonymous mapping> by /opt/vmware/nsx-edge/bin/lbconf_gen.py[lbconf_gen.py:12455] uid/euid:134/134 gid/egid:140/140, parent /opt/vmware/nsx-edge/bin/nginx[nginx:8167] uid/euid:134/134 gid/egid:140/140

(this one was taken at a different timestamps)

<25>1 2020-08-03T09:01:35.370199+00:00 e01e2f34#### NSX 61 LB [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-lb.lb" level="FATAL"] [bba2bc6c-7b00-402e-84af-############] cfg: failed to signal config change to engine (Connection refused).

<25>1 2020-08-03T09:01:35.370406+00:00 e01e2f3486a5 NSX 61 LB [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-lb.lb" level="FATAL"] [bba2bc6c-7b00-402e-84af-############] cfg: failed to generate Lb configuration

3. nginx.conf file is empty

ls -l /config/vmware/edge/lb/etc/bba2bc6c-7b00-402e-84af-############/nginx.conf

-rw-r----- 1 lb lb 0 Aug 3 09:26 /config/vmware/edge/lb/etc/bba2bc6c-7b00-402e-84af-############/nginx.conf

Resolution

CLI Commands

Check services used by LB

#get service dataplane

#get service dispatcher

#get service nsx-control-plane-agent

#get service nestdb

Check LB configuration

#get load-balancers

#get load-balancer <lb-uuid>

#get load-balancer <lb-uuid> virtual-servers

#get load-balancer <lb-uuid> virtual-server <vs-uuid>

#get load-balancer <lb-uuid> virtual-server <vs-uuid> lbrules

#get load-balancer <lb-uuid> pools

#get load-balancer <lb-uuid> pool <pool-uuid>

#get load-balancer <lb-uuid> monitors

#get load-balancer <lb-uuid> monitor <monitor-uuid>

Check LB status

#get load-balancers status

#get load-balancer <lb-uuid> status

#get load-balancer <lb-uuid> virtual-servers status

#get load-balancer <lb-uuid> virtual-server <vs-uuid> status

#get load-balancer <lb-uuid> pools status

#get load-balancer <lb-uuid> pools  <pool-uuid> status

#get load-balancer <lb-uuid> monitor <monitor-uuid> status

Check LB statistics

#get load-balancer <lb-uuid> stats

#get load-balancer <lb-uuid> stats verbose

#get load-balancer <lb-uuid> virtual-servers stats

#get load-balancer <lb-uuid> virtual-server <uuid> stats

#get load-balancer <lb-uuid> pools stats

#get load-balancer <lb-uuid> pool <pool-uuid> stats

#clear load-balancer <lb-uuid> stats

#clear load-balancer <lb-uuid> pools stats

#clear load-balancer <lb-uuid> pool <pool-uuid> stats

#clear load-balancer <lb-uuid> virtual-servers stats

#clear load-balancer <lb-uuid> virtual-server <vs-uuid> stats

Check LB HA

#get load-balancer <lb-uuid> high-availability-state

Check LB logs

#get load-balancer <lb-uuid> error-log

#get load-balancer <lb-uuid> virtual-server <vs-uuid> access-log

#set load-balancer <lb-uuid> rule-log

Check kni interface is created (only L7)

#ifconfig | grep kni-lrport

Check namespaces

#set debug

#get namespaces

 

API Commands

Check LB status (CPU, memory)

GET /api/v1/loadbalancer/services/<LB-Service_UUID>/status

 

Resizing a Load Balancer

If a load balancer is not owned by NSX-T and cannot be resized through the GUI:

  • Get the current LB configuration: curl -X GET -H Content-Type:application/json -ku username:password
  • https://NSXManagerIPAddress/api/v1/loadbalancer/services/<LoadBalancerUUID> > lb.json
  • Edit the lb.json file → modify "size" to desired form factor.
  • Push new configuration: curl -X PUT -H Content-Type:application/json -H X-Allow-Overwrite:True -ku username:password
  • https://NSXManagerIPAddress/api/v1/loadbalancer/services/<LoadBalancerUUID> -d @lb.json

 

Certificates

To review the Certificates details applied to a Virtual Server, follow these steps:

  • SSH into the Edge as Root.
  • Navigate to: "cd /config/vmware/edge/lb/etc/<Load-Balancer ID>/certs/"
  • In this folder, find all the certificate used by NGINX.

 

Additional log capture:

  • get firewall [Logical interface UUID] ruleset rules
  • get firewall [Logical interface UUID] ruleset stats
  • get firewall [Logical interface UUID] interface stats
  • get load-balancer [Load balancer UUID] pool [pool UUID] status
  • get dataplane cpu stats
  • capturing the health check traffic
        bp-sr-port (ICMP reply)
        downlink port (ICMP request)
        KNI (pass FW to linux)
  • keep using LB debug log and access log.
  • restart service dataplane (try this to restart datapathd)

 

Maximums

Check the size of the LB, and make sure that it is not exceeding the maximums for each size.

Refer to the Maximums section in this document.

 

Known issues with NSX Load Balancers

Handling Log Bundles for offline review with Broadcom support