VM outage and vSAN Object Liveness loss due to host RxCRC errors

search cancel

VM outage and vSAN Object Liveness loss due to host RxCRC errors

book

Article ID: 434064

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

VMware Cloud Foundation (VCF) or vSphere environments experience a production outage where workloads on a specific ESXi host lose connectivity.

Symptoms include:

vSAN objects reporting "lost liveness" in vmkernel.log.
No physical NIC flaps events reported.

vSphere HA failover events resulting in VMs being evacuated and rebooted on other hosts.

vmkernel.all log

2026-03-10T06:47:29.599Z In(182) vmkernel: cpu67:2099048)DOM: DOMOwner_SetLivenessState:11608: Object ########-####-####-####-####-############ lost liveness [#x############]

High counts of RxCRCErrors on physical NICs (uplinks) as seen in hostd.log and same can be confirmed from below command.

2026-03-10T06:48:25.015Z Wa(164) Hostd[2104327]: [Originator@6876 sub=Statssvc.StatsCollector] Error stats for pnic: vmnicX

2026-03-10T06:48:25.016Z Wa(164) Hostd[2103765]: --> errorsRx: 612

2026-03-10T06:48:25.016Z Wa(164) Hostd[2103765]: --> RxCRCErrors: 612

[root@esx:~] vsish -e get /net/pNics/vmnic0/stats
device {
   -- General Statistics:
   Rx Packets:125205569
   Tx Packets:45875950
   Rx Bytes:21581643405
   Tx Bytes:8471148351
   Rx CRC Errors:123456
 **********snipped***************

Environment

VMware vSphere

Cause

Incremental RxCRC errors on physical NIC uplinks, resulting from physical layer instability or upstream switch link-down events that blackholed vSAN and vSphere HA traffic without triggering a local NIC flap and hence triggering a vSphere HA failover of VMs to a different host.

Resolution

Identify Impacted Uplinks: Review host statistics or hostd.log to determine which vmnic interfaces are incrementing RxCRCErrors.
Inspect Physical Layer: Perform a physical inspection of the following components associated with the impacted host and its upstream switch ports:
- SFP/Transceiver modules.
- Fiber optic or copper cabling.
- Physical NIC hardware.
Verify Upstream Switch Health: Coordinate with the Network Team to check for:
- Link-down events on upstream switches.
- Uplink failures on the directly connected switch that may lead to traffic blackholing.
- Port errors or CRC increments on the switch-side interfaces.
Replace Components: Replace any identified faulty cables or transceivers to stabilize the physical signal.

Feedback

thumb_up Yes

thumb_down No