SRM replications across vSAN clusters running slow and showing RPO violations
search cancel

SRM replications across vSAN clusters running slow and showing RPO violations

book

Article ID: 393628

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

In a scenario where an MTU is set to 9000 on newly added vSAN hosts used for SRM replication and the rest of the cluster is using an MTU of 1500, the following symptoms may be observed on either Prod or DR clusters:


  • Prod or DR hosts showing an alert for "High pNic error rate detected" with a high rate of "Receive length errors", the thresholds for NIC errors are described in the following KB:

https://knowledge.broadcom.com/external/article?articleNumber=312096


  • SRM is showing a large number of RPO Violations

 

  • Prod or DR cluster vSAN objects stuck in reduced availability under Cluster - Monitor - vSAN Skyline Health - Data - vSAN object health

  • Prod or DR cluster vSAN resyncs appear stuck with showing "Stale" under Cluster - Monitor - Resyncing Objects - Intent column

Environment

8.0U3

Cause

Inconsistent MTU across SRM sites and/or vSAN cluster hosts, SRM specifically requires both Prod and SRM host SRM tagged vmkernel adapters be set to the same MTU

Resolution

Align MTU across SRM sites and/or vSAN cluster hosts, if large MTU (jumbo frames) are required they need to be set end-to-end as noted in the following vSphere networking documentation:

https://techdocs.broadcom.com/us/en/vmware-cis/vsphere/vsphere/8-0/vsphere-networking-8-0/managing-network-resources/enabling-jumbo-frames.html

Additional Information

The specific vSAN and SRM tagged interfaces might be different, so symptoms may differ depending on the specific network configuration.  Note if the SRM vmkernel adapter isn't tagged with "vSphere Replication traffic", then replication will occur over the Management tagged vmkernel adapter.  vSAN traffic will only run across vSAN tagged vmkernel adapters.

Reference KB on troubleshooting RPO violations:  https://knowledge.broadcom.com/external/article?articleNumber=312689