Symptoms:
- You are using VMware NSX-T.
- After the Load Balancer (LB) is reconfigured, an nginx core file is generated, from root of the edge you can see nginx core files similar to the below:
root@edge_name:/var/dump# ls
total 454M
-rw-rw-rw- 1 root root 321M Jun 26 12:39 core.nginx.1672058350.9414.134.11.gz
-rw-rw-rw- 1 root root 321M Jun 26 12:37 core.nginx.1672058216.8391.134.11.gz
- Pool members may report "Connect to Peer Failure" or "TCP Handshake Timeout".
- In var/log/syslog of the Edge Node you see log entries for "all pool members are down":
2022-12-27T01:22:23.064227+00:00 <edge_name> NSX 6552 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="ERROR" errorCode="EDG1200000"] [########-####-####-####-##########34] Operation.Category: 'LbEvent', Operation.Type: 'StatusChange', Obj.Type: 'Pool', Obj.UUID: '####9c89-########-####-####-##########95', Obj.Name: 'cluster:<name>', Lb.UUID: '########-####-####-####-##########34', Lb.Name: '<LB_LBname>', Vs.UUID: '########-####-####-####-##########f8', Vs.Name: '<name>', Status.NewStatus: 'Down', Status.Msg: 'all pool members are down'.
2022-12-27T01:22:23.064913+00:00 <edge_name> NSX 6552 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="lb" level="ERROR" errorCode="EDG9999999"] [########-####-####-####-##########34] Operation.Category: 'LbEvent', Operation.Type: 'StatusChange', Obj.Type: 'VirtualServer', Obj.UUID: '########-####-####-####-##########f8', Obj.Name: 'cluster:<name>', Lb.UUID: '########-####-####-####-##########34', Lb.Name: '<LB_LBname>', Status.NewStatus: 'Down', Status.Msg: 'all pool members are down'.
- The LB CONF process for the LB instance is not running, this can be confirmed by following the below steps:
1. Execute the below command from the root CLI of the Edge Node, this requires the UUID of the LB.
#ps -ef | grep lb | grep nginx | grep <LB UUID>
eg:
root@edge_name:~# ps -ef | grep lb | grep nginx | grep ########-####-####-####-##########a8
lb 9568 9481 0 Jun23 ? 00:00:00 /opt/vmware/nsx-edge/bin/nginx -u ########-####-####-####-##########a8 -g daemon off;
Note: Execute get load-balancer from the admin CLI of the active Edge Node, to retrieve the LB UUID. In the above example the LB UUID is ########-####-####-####-##########a8.
2. Use the nginx process ID (9568, as highlighted above) in the following command to confirm it has a LB CONF process running, if there is no output to the above command, there is no process running and the issue has been encountered.
#ps -ef | grep <nginx process ID>| grep CONF
eg:
Impacted
root@edge02:~# ps -ef | grep 9568 | grep CONF
root@edge02:~#
Not impacted
root@edge02:~# ps -ef | grep 9568 | grep CONF
lb 9572 9568 0 Jun23 ? 00:00:06 nginx: LB CONF process
root@edge02:~#
NOTE: The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on your environment.