vSAN Cluster might experience network instability issues with Cisco ACI and its Control Plane Learning feature enabled
book
Article ID: 385544
calendar_today
Updated On:
Products
VMware vSAN
Issue/Introduction
Using vSAN with nodes in different, routed subnets (such as usually seen with vSAN Stretched Clusters)
vSAN Performance Service is enabled on the affected vSAN Cluster
The vSAN cluster experienced network issues between the routed subnets
Using Cisco ACI with its "Control Plane Learning" and "Rogue Endpoint Control" features enabled on the vSAN- and Management-network
Cisco ACI is reporting IP flapping of the vmkernel adapters where vSAN is assigned on (typically alert "F3013: fltEpmRogueIpEpEpIPRogue" is raised)
This behavior should only be observed in ESXi 8.0 Update 2 and later, due to vSAN performance metrics now being sent via HTTPS TCP/443 while using HTTP TCP/80 before.
Environment
VMware vSphere ESXi 8.0 Network infrastructure using Cisco ACI
Cause
When vSAN Performance Service is enabled, all vSAN nodes in the same cluster are periodically sending latest metrics to the current vSAN master host. However under some circumstances, TCP RESET packets are being sent to other vSAN nodes using a wrong physical network uplink, which do still have the Source-IP from the vmkernel adapters used by vSAN. (For example: vSAN's vmk3 is only assigned to vmnic3. However, these packets from vmk3 are then unexpectedly leaving vmnic0 used solely for Management.)
These rogue IP TCP packets lead to Cisco ACI "learning" the vSAN-IPs on a wrong physical uplink and make unexpected routing decisions, and possibly sending vSAN data network traffic along a wrong path. This can affect the overall vSAN stability due to network connectivity issues between vSAN nodes across routed subnets.
Resolution
The issue has been addressed in ESXi 8.0 U3g.
Workaround
Disable "Control Plane Learning" for affected networks on Cisco ACI, or
Temporarily configure Static Routes between the vSAN subnets on ESXi. (Note: It is also important to apply this before adding a vSAN node to an existing cluster!)