VRRP (Virtual Router Redundancy Protocol) Split-Brain Condition When Multiple Nodes Reside on the Same ESXi Host

search cancel

VRRP (Virtual Router Redundancy Protocol) Split-Brain Condition When Multiple Nodes Reside on the Same ESXi Host

book

Article ID: 427472

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

When running high-availability (HA) appliances using VRRP (Virtual Router Redundancy Protocol) on Ubuntu, users may experience a "Split-Brain" scenario (multiple nodes assuming the Master role) specifically when two or more nodes are colocated on the same ESXi host.

Normal State: Node-A is Master; Node-B and Node-C are Backup (residing on different hosts).
Failure State: Node-A and Node-B both become Master when residing on the same host and the same port group (Standard vSwitch or NSX-backed).
Packet Capture Analysis: Network traces show VRRP advertisements or IGMPv3 packets being dropped by the vSwitch.
Drop Reason: VlanTag Mismatch at the VSwitch_FwdPolicyCheck function.

Environment

NSX

Cause

The issue is rooted in a multicast-related flow invalidation error within the vSphere Distributed Switch (or standard switch) architecture.

Specifically, when the Multi-Port Flow Cache is enabled, the destination port list in the flow cache entry may become incomplete or incorrectly mapped for multicast traffic. Even though the VMs share the same VLAN and port group, the vSwitch security or forwarding policy incorrectly flags the traffic with a VlanTag Mismatch, preventing the secondary node from "seeing" the Master's heartbeat.

Note: This behavior is often seen with IGMPv3 membership reports (Type 0x22) failing to populate the IGMP database correctly on the host.

Resolution

To resolve this condition, you must disable the multi-port flow cache for multicast traffic on the affected host. This forces the switch to evaluate the forwarding logic more granularly, bypassing the corrupted cache entry.

Procedure:

Log in to the ESXi host via SSH as a user with administrative privileges.
Execute the following command to disable the multicast flow cache globally or for the specific Distributed Virtual Switch (DVS) property:

net-dvs -u com.vmware.net.portset.fc.mcast.enabled -p hostPropList [Switch_Name]

NOTE: Above change is not persistent across reboots.

Additional Information

If an immediate configuration change is not possible, you can prevent the split-brain state by using VM-Host Anti-Affinity rules. Ensure that no two VRRP nodes ever reside on the same ESXi host.

Feedback

thumb_up Yes

thumb_down No