LB traffic failure after failover due to a race condition
search cancel

LB traffic failure after failover due to a race condition

book

Article ID: 438553

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Rebooting a standby Edge node or putting it in and out of Maintenance Mode (MM) may disrupt connections to the Load Balancer (LB) due to a race condition.

This issue occurs if the LB rules are loaded onto the datapath after a bulk sync initiates on the standby Edge node, but prior to a failover.

<DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewall-sync" tname="dp-ipc56" level="INFO"] Removed cached HA config of lrouter <LR_UUID> vrfid # (current count #)

<DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewall-sync" tname="dp-ipc56" level="INFO"] Applied cached FW peer config for <LR_UUID> vrfid #

 

<DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewalldp" tname="dp-ipc56" level="INFO"] lbs <LB_UUID>: rules committed (ruleset:##) [active (#) - vs:##, pool:##, pm:###] [inactive (#) - vs:#, pool:#, pm:#]

<DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewall" tname="dp-ipc56" level="INFO"] Config on SVC_LINK port <SVC_UUID> vrfid # skip # has # up # FW enabled # ruleset 0x### total ## msec

<DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewall" tname="dp-ipc56" level="INFO"] Apply conf LR <LR_UUID> (vrfid #) # ports ## msec

 

As seen in /var/log/firewallpkt.log, REJECT and ICMP port unreachable events will be logged if the default any-any rule is set to REJECT.

<DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewallpkt" level="INFO"] <# <UUID>:<UUID>> INET reason-match REJECT <RULE_ID> OUT ## TCP <SNAT_IP>/<PORT>-><DST_IP>/<PORT> PA

<DATE_TIME> <HOSTNAME> NSX #### FIREWALL [nsx@#### comp="nsx-edge" subcomp="datapathd" s2comp="firewallpkt" level="INFO"] <# <UUID>:<UUID>> INET reason-match REJECT <RULE_ID> OUT ## PROTO 1 <LB_VIP>-><DST_IP>

 

Environment

VMware NSX

Resolution

1. As a workaround, you can wait for the active sessions to naturally time out and terminate before initiating the failover.

2. This issue will be addressed in a future release.