When running high-availability (HA) appliances using VRRP (Virtual Router Redundancy Protocol) on Ubuntu, users may experience a "Split-Brain" scenario (multiple nodes assuming the Master role) specifically when two or more nodes are colocated on the same ESXi host.
Normal State: Node-A is Master; Node-B and Node-C are Backup (residing on different hosts).
Failure State: Node-A and Node-B both become Master when residing on the same host and the same port group (Standard vSwitch or NSX-backed).
Packet Capture Analysis: Network traces show VRRP advertisements or IGMPv3 packets being dropped by the vSwitch.
Drop Reason: VlanTag Mismatch at the VSwitch_FwdPolicyCheck function.
NSX
The issue is rooted in a multicast-related flow invalidation error within the vSphere Distributed Switch (or standard switch) architecture.
Specifically, when the Multi-Port Flow Cache is enabled, the destination port list in the flow cache entry may become incomplete or incorrectly mapped for multicast traffic. Even though the VMs share the same VLAN and port group, the vSwitch security or forwarding policy incorrectly flags the traffic with a VlanTag Mismatch, preventing the secondary node from "seeing" the Master's heartbeat.
Note: This behavior is often seen with IGMPv3 membership reports (Type 0x22) failing to populate the IGMP database correctly on the host.
To resolve this condition, you must disable the multi-port flow cache for multicast traffic on the affected host. This forces the switch to evaluate the forwarding logic more granularly, bypassing the corrupted cache entry.
Procedure:
net-dvs -u com.vmware.net.portset.fc.mcast.enabled=false -p hostPropList [Switch_Name]
If an immediate configuration change is not possible, you can prevent the split-brain state by using VM-Host Anti-Affinity rules. Ensure that no two VRRP nodes ever reside on the same ESXi host.