VM outage and vSAN Object Liveness loss due to host RxCRC errors
search cancel

VM outage and vSAN Object Liveness loss due to host RxCRC errors

book

Article ID: 434064

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

VMware Cloud Foundation (VCF) or vSphere environments experience a production outage where workloads on a specific ESXi host lose connectivity.

Symptoms include:

  • vSAN objects reporting "lost liveness" in vmkernel.log.

  • No physical NIC flaps events reported.
  • vSphere HA failover events resulting in VMs being evacuated and rebooted on other hosts.

    vmkernel.all log

    2026-03-10T06:47:29.599Z In(182) vmkernel: cpu67:2099048)DOM: DOMOwner_SetLivenessState:11608: Object ########-####-####-####-####-############ lost liveness [#x############]
    
  • High counts of RxCRCErrors on physical NICs (uplinks) as seen in hostd.log and same can be confirmed from below command.

    2026-03-10T06:48:25.015Z Wa(164) Hostd[2104327]: [Originator@6876 sub=Statssvc.StatsCollector] Error stats for pnic: vmnicX
    
    2026-03-10T06:48:25.016Z Wa(164) Hostd[2103765]: --> errorsRx: 612
    
    2026-03-10T06:48:25.016Z Wa(164) Hostd[2103765]: --> RxCRCErrors: 612

     

    [root@esx:~] vsish -e get /net/pNics/vmnic0/stats
    device {
       -- General Statistics:
       Rx Packets:125205569
       Tx Packets:45875950
       Rx Bytes:21581643405
       Tx Bytes:8471148351
       Rx CRC Errors:123456
     **********snipped***************



Environment

VMware vSphere

Cause

Incremental RxCRC errors on physical NIC uplinks, resulting from physical layer instability or upstream switch link-down events that blackholed vSAN and vSphere HA traffic without triggering a local NIC flap and hence triggering a vSphere HA failover of VMs to a different host.

Resolution

 

  • Identify Impacted Uplinks: Review host statistics or hostd.log to determine which vmnic interfaces are incrementing RxCRCErrors.

  • Inspect Physical Layer: Perform a physical inspection of the following components associated with the impacted host and its upstream switch ports:

    • SFP/Transceiver modules.

    • Fiber optic or copper cabling.

    • Physical NIC hardware.

  • Verify Upstream Switch Health: Coordinate with the Network Team to check for:

    • Link-down events on upstream switches.

    • Uplink failures on the directly connected switch that may lead to traffic blackholing.

    • Port errors or CRC increments on the switch-side interfaces.

  • Replace Components: Replace any identified faulty cables or transceivers to stabilize the physical signal.