Rebooting a standby Edge node or putting it in and out of Maintenance Mode (MM) may disrupt connections to the Load Balancer (LB) due to a race condition.
This issue occurs if the LB rules are loaded onto the datapath after a bulk sync initiates on the standby Edge node, but prior to a failover.
<DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewall-sync" tname="dp-ipc56" level="INFO"] Removed cached HA config of lrouter <LR_UUID> vrfid # (current count #) <DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewall-sync" tname="dp-ipc56" level="INFO"] Applied cached FW peer config for <LR_UUID> vrfid #
<DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewalldp" tname="dp-ipc56" level="INFO"] lbs <LB_UUID>: rules committed (ruleset:##) [active (#) - vs:##, pool:##, pm:###] [inactive (#) - vs:#, pool:#, pm:#] <DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewall" tname="dp-ipc56" level="INFO"] Config on SVC_LINK port <SVC_UUID> vrfid # skip # has # up # FW enabled # ruleset 0x### total ## msec <DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewall" tname="dp-ipc56" level="INFO"] Apply conf LR <LR_UUID> (vrfid #) # ports ## msec |
As seen in /var/log/firewallpkt.log, REJECT and ICMP port unreachable events will be logged if the default any-any rule is set to REJECT.
<DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewallpkt" level="INFO"] <# <UUID>:<UUID>> INET reason-match REJECT <RULE_ID> OUT ## TCP <SNAT_IP>/<PORT>-><DST_IP>/<PORT> PA <DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewallpkt" level="INFO"] <# <UUID>:<UUID>> INET reason-match REJECT <RULE_ID> OUT ## PROTO 1 <LB_VIP>-><DST_IP> |
VMware NSX
1. As a workaround, you can wait for the active sessions to naturally time out and terminate before initiating the failover.
2. This issue will be addressed in a future release.