var/log/syslog:
NSX 982221 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="INFO"] Core dump generation received by process: 10823 [nginx]
NSX 982221 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.nginx.###.gz
-rw-r--r-- 1 root root 109M ## # ### core.nginx.gz
VMware NSX
VMware NSX-T Data Center
This issue occurs when Source IP Persistence is disabled in a L4 virtual server, the expired timer is added by the nginx master process before the persistence table shared memory is freed but the persistence aging tree is not initialized in the nginx mater process.
This issue is resolved in VMware NSX 4.2.1.2, available at Broadcom downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.
Workaround
Disable Source IP Persistence.
or
Keep source IP persistence enabled and never disable it or delete the virtual server from the Edge node.
Scenarios:
1. The master process crash no more than 3 times, the Docker would restart automatically, and no HA failover would happen. The LB would recover after restarting. ( no failover needed )
2. The matser process crash multiple times (> 3 ), the docker cannot restart automatically at the 4th crash. Then the HA failover would happen automatically, the new active LB on another edge would handle the traffic. (no failover needed )
Only If user need to make the LB recover on the problematic edge, pls enter mm mode manually on this edge to recover.
you can confirm the docker status using the below command to check all the containers' status :docker ps
Alarm to check if the edge has failed over:
Tier 1 Gateway failed over alarm
We may monitor the coredump from syslog, if the coredump is observed. Pls check the LB container running time and log with
docker ps docker logs LB_CONTAINER_NAME
It would show the container running time and the log. From them, we can confirm if the container is restarted.