In NSX 6.4.1 or above, when you have the following configurations, the edge HA (with load balancer) enters a split-brain situation when the grouping objects setting (for load balancer) is modified.
Symptoms:
Highavailability Healthcheck Status: This unit [0]: Up Active: 1 Peer unit [1]: Up Active: 0 Session via vNic_1: 10.1.1.1:10.1.1.2 Unreachable.
vNic_1 Link encap:Ethernet HWaddr ##:##:##:##:##:BC inet addr:10.1.1.2 Bcast:10.1.1.3 Mask:255.255.255.252 inet6 addr: fe80::250:56ff:feab:528b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:19223251 errors:0 dropped:261 overruns:0 frame:0 TX packets:17289853 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2223722150 (2120.7 Mb) TX bytes:3620139033 (3452.4 Mb)
2018-10-17T10:07:31+00:00 EdgeLBHA-0 config[]: [default]: [daemon.info] INFO :: VseCommandHandler :: command json file: .. 2018-10-17T10:07:34+00:00 EdgeLBHA-0 [user.notice] 2018-10-17T10:07:34+00:00 EdgeLBHA-0 config[]: [default]: [daemon.debug] DEBUG :: C_ServiceControl :: Checking status, op: unmonitor, service: syslog-ng, status: Not monitored 2018-10-17T10:07:34+00:00 EdgeLBHA-0 config[]: [default]: [daemon.info] INFO :: C_ServiceControl :: Action unmonitor for syslog-ng done 2018-10-17T10:07:34+00:00 EdgeLBHA-0 config[]: [default]: [daemon.info] INFO :: C_ServiceControl :: serverid: 0, state: 0 2018-10-17T10:07:34+00:00 EdgeLBHA-0 config[]: [default]: [daemon.debug] DEBUG :: C_ServiceControl :: Send signal to reload, server: syslog-ng, signal: HUP, pids: 833 2018-10-17T10:07:34+00:00 EdgeLBHA-1 syslog-ng[833]: [default]: [syslog.notice] Configuration reload request received, reloading configuration; 2018-10-17T10:07:34+00:00 EdgeLBHA-1 syslog-ng[833]: [default]: [syslog.notice] Configuration reload finished;
From the above log extract, we see the hostname of the VM moving from index -0 to index-1.
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
This issue is resolved in VMware NSX Data Center for vSphere 6.4.4.
Workaround:
To workaround this issue, you need to follow either one of the below: