NSX Load Balancer nginx master process has crashes when Source IP Persistence is configured
search cancel

NSX Load Balancer nginx master process has crashes when Source IP Persistence is configured

book

Article ID: 396231

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX native Load Balancer is configured.
  • Either of the 2 situation matches.
    • Source IP Persistence has been configured in the L4 virtual server and has been subsequently disabled.
    • Source IP Persistence has been configured in the L4 virtual server and the virtual server has been subsequently deleted from the Edge node.
  • The Load Balancer nginx master process crashed with the below entries encountered in var/log/syslog:
    NSX 982221 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="INFO"] Core dump generation received by process: 10823 [nginx]

    NSX 982221 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.nginx.###.gz
  • An nginx core dump will be present in var/log/core:
    -rw-r--r--  1 root root 109M ## # ### core.nginx.gz

Environment

VMware NSX
VMware NSX-T Data Center 

Cause

This issue occurs when Source IP Persistence is disabled in a L4 virtual server, the expired timer is added by the nginx master process before the persistence table shared memory is freed but the persistence aging tree is not initialized in the nginx mater process.

Resolution

This issue is resolved in VMware NSX 4.2.1.2, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

Workaround

Disable Source IP Persistence.
or
Keep source IP persistence enabled and never disable it or delete the virtual server from the Edge node.

Scenarios:

1. The master process crash no more than 3 times, the Docker would restart automatically, and no HA failover would happen. The LB would recover after restarting (no failover needed).

2. The master process crash multiple times (> 3), the docker cannot restart automatically at the 4th crash. Then the HA failover would happen automatically, the new active LB on another edge would handle the traffic (no failover needed).

Only if user need to make the LB recover on the problematic edge, please enter Maintenance Mode manually on this edge to recover.

You can confirm the docker status using the below command to check all the containers' status :

docker ps

Alarm to check if the Edge has failed over: 

Tier 1 Gateway failed over alarm

We may monitor the core dump from syslog, if the core dump is observed. Please check the LB container running time and log with

docker ps 
docker logs  LB_CONTAINER_NAME

It would show the container running time and the log. This way we can confirm if the container is restarted.