In NSX 6.4.1 or above, when you have the following configurations, the edge HA (with load balancer) enters a split-brain situation when the grouping objects setting (for load balancer) is modified.
- NSX Edge are configured in High Availability (HA) mode.
- NSX Edge are also configured as Load balancer (LB).
- NSX Edge Load balancer (LB) pool members are configured with grouping objects (Security groups, IP sets, etc...).
Symptoms:
- The High Availability (HA) channel between Edge VMs goes down as shown in [show service highavailability] command: show service highavailability
Highavailability Healthcheck Status:
This unit [0]: Up Active: 1
Peer unit [1]: Up Active: 0
Session via vNic_1: 1.1.1.1:1.1.1.2 Unreachable.
- The Edge VM index -0 HA interface IP is not correct (switch to Edge VM index -1): show interface vNic_1 (Where vNic_1 is the Mgmt HA interface)
vNic_1 Link encap:Ethernet HWaddr 00:50:56:AB:AB:BC
inet addr:1.1.1.2 Bcast:1.1.1.3 Mask:255.255.255.252
inet6 addr: fe80::250:56ff:feab:528b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:19223251 errors:0 dropped:261 overruns:0 frame:0
TX packets:17289853 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000
RX bytes:2223722150 (2120.7 Mb) TX bytes:3620139033 (3452.4 Mb)
- In the Edge log, we see the NSX Manager sending a configuration (json file) with the instruction to reload the configuration: show log follow
2018-10-17T10:07:31+00:00 EdgeLBHA-0 config[]: [default]: [daemon.info] INFO :: VseCommandHandler :: command json file:
..
2018-10-17T10:07:34+00:00 EdgeLBHA-0 [user.notice]
2018-10-17T10:07:34+00:00 EdgeLBHA-0 config[]: [default]: [daemon.debug] DEBUG :: C_ServiceControl :: Checking status, op: unmonitor, service: syslog-ng, status: Not monitored
2018-10-17T10:07:34+00:00 EdgeLBHA-0 config[]: [default]: [daemon.info] INFO :: C_ServiceControl :: Action unmonitor for syslog-ng done
2018-10-17T10:07:34+00:00 EdgeLBHA-0 config[]: [default]: [daemon.info] INFO :: C_ServiceControl :: serverid: 0, state: 0
2018-10-17T10:07:34+00:00 EdgeLBHA-0 config[]: [default]: [daemon.debug] DEBUG :: C_ServiceControl :: Send signal to reload, server: syslog-ng, signal: HUP, pids: 833
2018-10-17T10:07:34+00:00 EdgeLBHA-1 syslog-ng[833]: [default]: [syslog.notice] Configuration reload request received, reloading configuration;
2018-10-17T10:07:34+00:00 EdgeLBHA-1 syslog-ng[833]: [default]: [syslog.notice] Configuration reload finished;
From the above log extract, we see the hostname of the VM moving from index -0 to index-1.
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.