This article contains the basic information for troubleshooting the NSX Native Load Balancer and the data required when opening a support request with Broadcom.
VMware NSX
Common Issues:
Check LB logs for 502 Bad Gateway issue
1. Enable DEBUG logging on the Load Balancer page and Access Logs on the Virtual Server page in the NSX-T UI.
2. SSH into the Edge (Where the LB is active) and move into: /var/log/lb/<LB ID>/logs
grep -a "502" error.log.11.gz
error.log.11.gz:2019/09/25 11:54:16 [debug] 32409#0: *11258672 HTTP/1.1 502 Bad Gateway
error.log.11.gz:2019/09/25 11:54:24 [debug] 32406#0: *11292889 HTTP/1.1 502 Bad Gateway
Note: 502 bad gateway can be due to header issues, response or request.
In the /var/log/lb/<LoadBalancer-UUID>/logs/error.lo
g, specifically look for the 502 error, then it should clearly indicate the header is too big.
Note: Usually the header too big issue is due to a large cookie added.
If this is the case, the headers can be changed in the GUI.
Manager interface only supports request header size change, while Policy Interface supports both response and request.
If only a request is required, a new policy profile can be created and attached to the manager load balancer.
Make sure to revert debug logging when done.
NOTE: Debug logs are deleted the moment DEBUG is turned off. So always gather logs from the edge BEFORE disabling debug logging.
Check NGINX.conf file generation issues.
1. check the lbconf_gen.log
/var/log/lb/<loadbalancer-UUID>/lbconf_gen.log
2020-08-03 09:01:35,301 204 lb ERROR failed to build nginx config
2020-08-03 09:01:35,301 204 lb ERROR 'ascii' codec can't encode character '\u2013' in position 6950: ordinal not in range(128)
2. Error in /var/log/syslog
<5>1 2020-08-03T09:26:27.355660+00:00 w1-dmz-edge04 kernel - - - [ 3789.995524] grsec: [e01e2f34####] denied RWX mmap of <anonymous mapping> by /opt/vmware/nsx-edge/bin/lbconf_gen.py[lbconf_gen.py:12455] uid/euid:134/134 gid/egid:140/140
, parent /opt/vmware/nsx-edge/bin/nginx[nginx:8167] uid/euid:134/134 gid/egid:140/140
(this one was taken at a different timestamps)
<25>1 2020-08-03T09:01:35.370199+00:00 e01e2f34#### NSX 61 LB [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-lb.lb" level="FATAL"] [bba2bc6c-7b00-402e-84af-############] cfg: failed to signal config change to engine (Connection refused).
<25>1 2020-08-03T09:01:35.370406+00:00 e01e2f3486a5 NSX 61 LB [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-lb.lb" level="FATAL"] [bba2bc6c-7b00-402e-84af-
############
] cfg: failed to generate Lb configuration
3. nginx.conf file is empty
ls -l /config/vmware/edge/lb/etc/bba2bc6c-7b00-402e-84af-############/nginx.conf
-rw-r----- 1 lb lb 0 Aug 3 09:26
/config/vmware/edge/lb/etc/bba2bc6c-7b00-402e-84af-############/nginx.conf
CLI Commands
Check services used by LB |
#get service dataplane #get service dispatcher #get service nsx-control-plane-agent #get service nestdb |
Check LB configuration |
#get load-balancers #get load-balancer <lb-uuid> #get load-balancer <lb-uuid> virtual-servers #get load-balancer <lb-uuid> virtual-server <vs-uuid> #get load-balancer <lb-uuid> virtual-server <vs-uuid> lbrules #get load-balancer <lb-uuid> pools #get load-balancer <lb-uuid> pool <pool-uuid> #get load-balancer <lb-uuid> monitors #get load-balancer <lb-uuid> monitor <monitor-uuid> |
Check LB status |
#get load-balancers status #get load-balancer <lb-uuid> status #get load-balancer <lb-uuid> virtual-servers status #get load-balancer <lb-uuid> virtual-server <vs-uuid> status #get load-balancer <lb-uuid> pools status #get load-balancer <lb-uuid> pools <pool-uuid> status #get load-balancer <lb-uuid> monitor <monitor-uuid> status |
Check LB statistics |
#get load-balancer <lb-uuid> stats #get load-balancer <lb-uuid> stats verbose #get load-balancer <lb-uuid> virtual-servers stats #get load-balancer <lb-uuid> virtual-server <uuid> stats #get load-balancer <lb-uuid> pools stats #get load-balancer <lb-uuid> pool <pool-uuid> stats #clear load-balancer <lb-uuid> stats #clear load-balancer <lb-uuid> pools stats #clear load-balancer <lb-uuid> pool <pool-uuid> stats #clear load-balancer <lb-uuid> virtual-servers stats #clear load-balancer <lb-uuid> virtual-server <vs-uuid> stats |
Check LB HA |
#get load-balancer <lb-uuid> high-availability-state |
Check LB logs |
#get load-balancer <lb-uuid> error-log #get load-balancer <lb-uuid> virtual-server <vs-uuid> access-log #set load-balancer <lb-uuid> rule-log |
Check kni interface is created (only L7) |
#ifconfig | grep kni-lrport |
Check namespaces |
#set debug #get namespaces |
API Commands
Check LB status (CPU, memory) |
GET /api/v1/loadbalancer/services/<LB-Service_UUID>/status |
Resizing a Load Balancer
If a load balancer is not owned by NSX-T and cannot be resized through the GUI:
curl -X GET -H Content-Type:application/json -ku username:password
https://NSXManagerIPAddress/api/v1/loadbalancer/services/<LoadBalancerUUID> > lb.json
curl -X PUT -H Content-Type:application/json -H X-Allow-Overwrite:True -ku username:password
https://NSXManagerIPAddress/api/v1/loadbalancer/services/<LoadBalancerUUID> -d @lb.json
Certificates
To review the Certificates details applied to a Virtual Server, follow these steps:
cd /config/vmware/edge/lb/etc/<Load-Balancer ID>/certs/"
Additional log capture:
get firewall [Logical interface UUID] ruleset rules
get firewall [Logical interface UUID] ruleset stats
get firewall [Logical interface UUID] interface stats
get load-balancer [Load balancer UUID] pool [pool UUID] status
get dataplane cpu stats
restart service dataplane
(try this to restart datapathd)
Maximums
Check the size of the LB, and make sure that it is not exceeding the maximums for each size.
Refer to the Maximums section in this document.
Known issues with NSX Load Balancers
Handling Log Bundles for offline review with Broadcom support