Troubleshooting NSX Native Load Balancer
search cancel

Troubleshooting NSX Native Load Balancer

book

Article ID: 376344

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

This article contains the basic information for troubleshooting the NSX Native Load Balancer and the data required when opening a support request with Broadcom.

Environment

VMware NSX

 

Cause

Common Issues

Check LB logs for 502 Bad Gateway issue

  1. Enable DEBUG logging on the Load Balancer page and Access Logs on the Virtual Server page in the NSX-T UI.                                                                                                                                                        

  2. SSH into the Edge (Where the LB is active) as root user and change to: cd /var/log/lb/<LB ID>/logs
    grep -a "502" error.log
    error.log:2019/09/25 11:54:16 [debug] 32409#0: *11258672 HTTP/1.1 502 Bad Gateway
    error.log:2019/09/25 11:54:24 [debug] 32406#0: *11292889 HTTP/1.1 502 Bad Gateway

  3. Virtual server access logs can be found in /var/log/syslog as shown below, search for subcomp="lb" and s2comp="access":

    LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="access" level="INFO"] [######-####-####-####-############][########-####-####-####-############] Operation.Category: 'LbAccessLog', Operation.Type: 'Http', Lb.UUID: '#######-####-####-####-############', Lb.Name: 'LB-NAME', Vs.UUID: '########-####-####-####-############', Vs.Name: '####', Vs.Ip: '##.###.#.#', Vs.Port: '443', Pool.UUID: '########-####-####-####-##########ab', Pool.Name: 'POOL-NAME', PoolMember.Ip: '##.###.#.#', PoolMember.Port: '443', Client.Ip: '##.##.##.#', Client.Port: '####', Snat.Ip: '##.###.#.#', Snat.Port: '63069', HttpRequest.Method: 'POST', HttpRequest.UserAgent:, HttpRequest.X-Fwd-For: '-', HttpRequest.Uri: '/auth/access_token', HttpRequest.Host: '#######', HttpResponse.Status: '502', HttpResponse.StatusCategory: '5xx', HttpResponse.Size: '0', HttpResponse.ServerTime: '31.624', HttpResponse.TotalTime: '31.628', Error.Reason: '-'.

    Note: 502 bad gateway can be due to header issues, they can be either HTTP response or request headers.
  4. In the /var/log/lb/<LoadBalancer-UUID>/logs/error.log, specifically look for the 502 error, then it should clearly indicate the header is too big.
    Note:  Usually the header too big issue is due to a large cookie added.

  5. If this is the case, the headers can be changed in the UI.

    1. Manager interface only supports request header size change, while Policy Interface supports both response and request.

    2. If only a request is required, a new policy profile can be created and attached to the manager load balancer.

  6. Make sure to revert debug logging when done.
    NOTE: Debug logs are deleted the moment DEBUG is turned off.  So always gather logs from the edge BEFORE disabling debug logging.

 

Check NGINX.conf file generation issues

  1. Check the edge /var/log/lb/<loadbalancer-UUID>/lbconf_gen.log

    2020-08-03 09:01:35,301 204 lb ERROR failed to build nginx config

    2020-08-03 09:01:35,301 204 lb ERROR 'ascii' codec can't encode character '\u2013' in position 6950: ordinal not in range(128)

     

  2. Check for Error in edge /var/log/syslog:

    <5>1 2020-08-03T09:26:27.355660+00:00 w1-dmz-edge04 kernel - - - [ 3789.995524] grsec: [e01e2f34####] denied RWX mmap of <anonymous mapping> by /opt/vmware/nsx-edge/bin/lbconf_gen.py[lbconf_gen.py:12455] uid/euid:134/134 gid/egid:140/140, parent /opt/vmware/nsx-edge/bin/nginx[nginx:8167] uid/euid:134/134 gid/egid:140/140

    (this one was taken at a different timestamps)

    <25>1 2020-08-03T09:01:35.370199+00:00 e01e2f34#### NSX 61 LB [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-lb.lb" level="FATAL"] [bba2bc6c-7b00-402e-84af-############] cfg: failed to signal config change to engine (Connection refused).

    <25>1 2020-08-03T09:01:35.370406+00:00 e01e2f3486a5 NSX 61 LB [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-lb.lb" level="FATAL"] [bba2bc6c-7b00-402e-84af-############] cfg: failed to generate Lb configuration
  3. On the edge node, the nginx.conf file is empty, as root user, run and note the size of the file is 0:

    ls -l /config/vmware/edge/lb/etc/bba2bc6c-7b00-402e-84af-############/nginx.conf

    -rw-r----- 1 lb lb 0 Aug 3 09:26 /config/vmware/edge/lb/etc/bba2bc6c-7b00-402e-84af-############/nginx.conf

Resolution

CLI Commands

Check services used by LB (nsxcli commands)

get service dataplane

get service dispatcher

get service nsx-control-plane-agent

get service nestdb

Check LB configuration (nsxcli commands)

get load-balancers

get load-balancer <lb-uuid>

get load-balancer <lb-uuid> virtual-servers

get load-balancer <lb-uuid> virtual-server <vs-uuid>

get load-balancer <lb-uuid> virtual-server <vs-uuid> lbrules

get load-balancer <lb-uuid> pools

get load-balancer <lb-uuid> pool <pool-uuid>

get load-balancer <lb-uuid> monitors

get load-balancer <lb-uuid> monitor <monitor-uuid>

Check LB status (nsxcli commands)

get load-balancers status

get load-balancer <lb-uuid> status

get load-balancer <lb-uuid> virtual-servers status

get load-balancer <lb-uuid> virtual-server <vs-uuid> status

get load-balancer <lb-uuid> pools status

get load-balancer <lb-uuid> pools  <pool-uuid> status

get load-balancer <lb-uuid> monitor <monitor-uuid> status

Check LB statistics (nsxcli commands)

get load-balancer <lb-uuid> stats
get load-balancer <lb-uuid> session-tables

get load-balancer <lb-uuid> stats verbose

get load-balancer <lb-uuid> virtual-servers stats

get load-balancer <lb-uuid> virtual-server <uuid> stats

get load-balancer <lb-uuid> pools stats

get load-balancer <lb-uuid> pool <pool-uuid> stats

clear load-balancer <lb-uuid> stats

clear load-balancer <lb-uuid> pools stats

clear load-balancer <lb-uuid> pool <pool-uuid> stats

clear load-balancer <lb-uuid> virtual-servers stats

clear load-balancer <lb-uuid> virtual-server <vs-uuid> stats

Check LB HA (nsxcli commands)

get load-balancer <lb-uuid> high-availability-state

Check LB logs (nsxcli commands)

get load-balancer <lb-uuid> error-log

get load-balancer <lb-uuid> virtual-server <vs-uuid> access-log

set load-balancer <lb-uuid> rule-log

Check kni interface is created (only L7) (root commands)

#ifconfig | grep kni-lrport

Check namespaces (root commands)

#set debug

#get namespaces

 

API Commands

Check LB status (CPU, memory)

GET /api/v1/loadbalancer/services/<LB-Service_UUID>/status

 

Resizing a Load Balancer

If a load balancer is not owned by NSX-T and cannot be resized through the UI, use below API calls:

  • Get the current LB configuration:

    curl -X GET -H Content-Type:application/json -ku username:password https://NSXManagerIPAddress/api/v1/loadbalancer/services/<LoadBalancerUUID> > lb.json

  • Edit the lb.json file and modify "size" entry to desired form factor, matching supported form factors NSX Edge VM System Requirements:

  • Push new modified configuration:

    curl -X PUT -H Content-Type:application/json -H X-Allow-Overwrite:True -ku username:password https://NSXManagerIPAddress/api/v1/loadbalancer/services/<LoadBalancerUUID> -d @lb.json

Note: Ensure to take a backup prior to modifications.

Certificates

To review the Certificates details applied to a Virtual Server, follow these steps:

  • SSH into the Edge as Root.
  • Navigate to: "cd /config/vmware/edge/lb/etc/<Load-Balancer ID>/certs/"
  • In this folder, find all the certificate used by NGINX.

Additional log capture

  • get firewall [Logical interface UUID] ruleset rules
  • get firewall [Logical interface UUID] ruleset stats
  • get firewall [Logical interface UUID] interface stats
  • get load-balancer [Load balancer UUID] pool [pool UUID] status
  • get dataplane cpu stats
  • capturing the health check traffic
        bp-sr-port (ICMP reply)
        downlink port (ICMP request)
        KNI (pass FW to linux)
  • keep using LB debug log and access log.
  • restart service dataplane (try this to restart datapathd)

Maximums

Check the size of the LB, and make sure that it is not exceeding the maximums for each size.

Refer to the Maximums section in this document.

Known unsupported configuration

  • The Native LB does not support the 'Content-Security-Policy' header.

Known issues with NSX Load Balancers

Handling Log Bundles for offline review with Broadcom support