NSX Load Balancer nginx master process has crashes when Source IP Persistence is configured
search cancel

NSX Load Balancer nginx master process has crashes when Source IP Persistence is configured

book

Article ID: 396231

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX native Load Balancer is configured.
  • Either of the 2 situation matches.
    • Source IP Persistence has been configured in the L4 virtual server and has been subsequently disabled.
    • Source IP Persistence has been configured in the L4 virtual server and the virtual server has been subsequently deleted from the Edge node.
  • The Load Balancer nginx master process crashed with the below entries encountered in var/log/syslog:
    NSX 982221 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="INFO"] Core dump generation received by process: 10823 [nginx]

    NSX 982221 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.nginx.###.gz
  • An nginx core dump will be present in var/log/core:
    -rw-r--r--  1 root root 109M ## # ### core.nginx.gz

Environment

VMware NSX 

VMware NSX-T Data Center 

Cause

This issue occurs when Source IP Persistence is disabled in a L4 virtual server, the expired timer is added by the nginx master process before the persistence table shared memory is freed but the persistence aging tree is not initialized in the nginx mater process.

Resolution

This issue is resolved in VMware NSX 4.2.1.2, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

 

Workaround

Disable Source IP Persistence.
or
Keep source IP persistence enabled and never disable it or delete the virtual server from the Edge node.

 

Scenarios:
1.  The master process crash  no more than 3 times,  the Docker would restart automatically, and no HA failover would happen.  The LB would recover after restarting. ( no failover needed )

2. The matser process crash multiple times (> 3  ), the docker cannot restart automatically at the 4th crash.  Then the HA  failover  would happen automatically,  the new active LB  on another edge would handle the traffic. (no failover needed )

 

Only If user need to make the LB recover on the problematic edge,  pls enter mm mode manually on this edge to recover.
you can confirm the docker status using the below command to check all the containers' status :

docker ps

Alarm to check if the edge has failed over: 

Tier 1 Gateway failed over alarm

We may monitor the  coredump from syslog,  if the coredump is observed.  Pls check the LB container running time  and log with

docker ps 
docker logs  LB_CONTAINER_NAME

It would show the container running time and the log.   From them, we can confirm if the container is restarted.