NSX-T load balancer stops working until restart and creates core dump
search cancel

NSX-T load balancer stops working until restart and creates core dump

book

Article ID: 318312

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Clients are not able to access backend servers via the Load Balancer.
  • In some circumstances, it could take up to several seconds to display the backend server pages or they may not display at all.
  • In the NSX-T Edge log /var/log/syslog, the following messages can be seen:
2022-08-25T08:42:55.036211-07:00 nsxedge NSX 27498 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="FATAL"] [########-####-####-####-############] connect() to ip-address:443 failed (99: Cannot assign requested address) while connecting to upstream, client: ip-address, server: , request: "GET / HTTP/1.1", upstream: "https://ip-address:443/", host: "host-name"

2022-08-25T08:43:48.497141-07:00 nsxedge NSX 27498 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="FATAL"] [########-####-####-####-############] connect() to ip-address:443 failed (99: Cannot assign requested address) while connecting to upstream, client: ip-address, server: , request: "GET /Images/image.png HTTP/1.1", upstream: "https://ip-address:443/Images/image.png", host: "hostname", referrer: "https://referrer/History.aspx?token=63c391f8-####-####-####-7cc80e15136b&AGU=0"

2022-08-25T08:43:59.970790-07:00 nsxedges NSX 18739 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="FATAL"] [########-####-####-####-############] worker process 27498 exited on signal 11 (core dumped)
  • An NGINX Core dump can be found on the NSX-T edge node within the directory /var/log/core like the following:
/var/log/core/core.nginx.1606488022.24628.134.11.gz



Environment

VMware NSX-T Data Center

Cause

In some HTTPS cases, the client connection is closed, but the Load Balancer's upstream connection is not closed immediately.

When this happens, the Load Balancer accessed invalid memory, which leads to a Segmentation Fault.

When this issues occurs, the Load Balancer is unable to process the server keep-alive in the HTTP header.

Resolution

This issue is resolved in VMware NSX-T Data Center 3.2.2 and VMware NSX 4.0.1

Workaround:

To workaround this issue, you can disable the NTLM Auth & Server Keep-alive parameters on the HTTP application profile.

To disable server keep-alive from UI:

  • Select Networking > Load Balancing > Profiles > ensure Select Profile Type is set to Application.
  • Then edit the HTTP Application profile being used and next to server keep-alive toggle the switch to off and then save.
Note: If you are using the default HTTP Application profile, you will not be able to edit it, therefore create a new HTTP Application profile and replace the default one.


To disable NTLM Auth & Server Keep-alive via API:

  •  Use the following GET API call and locate the application profile ID in the result:

GET https://<nsx-mgr>/api/v1/loadbalancer/application-profiles

  •  Retrieve configuration/settings of the application profile using ID collected in previous command:

GET https://<nsx-mgr>/api/v1/loadbalancer/application-profiles/<application profile ID>

  • Copy and paste the entire body that was returned from the previous GET API call into the body of the following PUT API call.  The two changes that need to be made to the information within the body are:

"ntlm": true, --------> this needs to be set to false
"server_keep_alive" : true, -------> this needs to be set to false
PUT https://<nsx-mgr>/api/v1/loadbalancer/application-profiles/<application-profile-id>


Note: If you get the following API response

"httpStatus": "BAD_REQUEST",
"error_code": 289,
"module_name": "common-services",
"error_message": "Principal 'admin' with role '[enterprise_admin]' attempts to delete or modify an object of type LoadBalancerHttpProfile it doesn't own. (createUser=nsx_policy, allowOverwrite=null)"

This occurs when you are trying to modify a policy object with a manager API call, it is indicating the object is protected by policy, you can use the following header to over ride the protection and allow the manager API to edit the policy object:

Add the following under header key/value pair:
Key= X-Allow-Overwrite
Value= true