LB pool members are not reachable after edge failover to a particular edge node
search cancel

LB pool members are not reachable after edge failover to a particular edge node

book

Article ID: 424886

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

After an Edge failover, Load balance health check TCP sessions are flapping to reach pool members.

It is observed in /var/log/syslog on the edge node that health check status becomes down and up.

<DATE_TIME> <HOSTNAME> NSX 141555 LOAD-BALANCER [nsx@6876 comp=""nsx-edge"" subcomp=""lb"" s2comp=""lb"" level=""WARN""] [<UUID>] HLCK: monitor <UUID> server: <POOL_MEMBER_IP>:<POOL_MEMBER_PORT> change to down, code: 8)

<DATE_TIME> <HOSTNAME> NSX 141555 LOAD-BALANCER [nsx@6876 comp=""nsx-edge"" subcomp=""lb"" s2comp=""lb"" level=""WARN""] [<UUID>] HLCK: monitor <UUID> server: <POOL_MEMBER_IP>:<POOL_MEMBER_PORT> change to up

And it is seen in /var/log/syslog on the edge node that the tunnel status becomes down and up as well.

<DATE_TIME> <HOSTNAME> NSX 1 FABRIC [nsx@6876 comp=""nsx-edge"" subcomp=""nsxa"" s2comp=""tunnel"" level=""INFO""] Tunnel <TEP_IP>:<TEP_IP>(geneve) state updated from up to down

<DATE_TIME> <HOSTNAME> NSX 1 FABRIC [nsx@6876 comp=""nsx-edge"" subcomp=""nsxa"" s2comp=""tunnel"" level=""INFO""] Tunnel <TEP_IP>:<TEP_IP>(geneve) state updated from down to up

 

Environment

VMware NSX

Cause

By checking ESX hosts of the pool member IP, duplicate mac addresses from the different edge nodes are found to trigger the tunnel flapping.

Output of : localcli network ip neighbor list -N vxlan

Neighbor     Mac Address        Vmknic  Expiry    State  Type
-----------  -----------------  ------  --------  -----  ----
<TEP_IP_#1>   <MAC_ADDRESS>  vmk10   1161 sec         Dynamic
<TEP_IP_#2>   <MAC_ADDRESS>  vmk10   1147 sec         Dynamic

Resolution

The resolution can be found in the following KB article:

https://knowledge.broadcom.com/external/article/345804