Multiple ESXi hosts experienced mgmt and VM network isolation due to a Hung Cisco switch
search cancel

Multiple ESXi hosts experienced mgmt and VM network isolation due to a Hung Cisco switch

book

Article ID: 435726

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms: 

  • Multiple hosts disconnected from vCenter. 

  • Virtual machines experience a complete loss of network connectivity and are inaccessible over the network.

  • The affected virtual machines remain in a "Powered On" state within the vCenter Server inventory.
  • iSCSI path timeouts recorded in vmkernel.log. where the path status remain UP.

vmkernel: cpu159:12864457)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x12 (0x45cc27363840, 0) to dev "naa.624###################e6" on path "vmhba68:C2:T0:L###" Failed:
vmkernel: cpu78:12864422)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:1005: Path "vmhba68:C2:T0:L###" (UP) command 0x12 failed with status Timeout. H:0x5 D:0x0 P:0x0
vmkwarning: cpu78:12864422)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:190: Could not get page 83 INQUIRY data for path "vmhba68:C2:T0:L###" - Timeout (195887137)
vmkernel: cpu182:2098869)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x8a (0x45dc31937580, 2169928) to dev "naa.624###################e6" on path "vmhba68:C2:T0:L###" Failed:
vmkwarning: cpu78:12864422)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:190: Could not get page 83 INQUIRY data for path "vmhba68:C5:T0:L###" - Transient storage condition, suggest retry (195887294) 

  • The absence of SCSI ABORT events in vmx logs confirms storage remained accessible via the final two active paths (C3 and C7), preventing a VM kernel panic or file system read-only transition.

  • FDM logs verify that the cluster was unable to protect the workloads due to the isolationResponse setting:

    2026-03-24T18:39:34.764Z In(166) Fdm[2101754]: --> _clusterVmDefaults = [... <isolationResponse>none</isolationResponse> ...]

Environment

VMware vSphere ESXi 8.x 
VMware vSphere ESX  9.x 

Cause

A Forwarding Plane Stall (ASIC hang) on a Cisco switch servicing both iSCSI and VM/Management traffic. In this "Grey Failure" state, the switch ceased frame forwarding but failed to transition ports to an Error-Disabled or Link Down state. This bypassed the ESXi standard failure detection logic which is based on Link status. The outage was extended by a configuration gap where vSphere HA was not set to respond to host isolation, and the lack of Link State Tracking on the physical fabric prevented the host from recognizing the path was unviable.

Resolution

Additional Information

Related Information

  • Teaming Policy Note: Under "Route Based on Originating Virtual Port," a VM is pinned to a single physical NIC. Without Link State Tracking or Beacon Probing, that VM will remain on a hung NIC indefinitely.

     

  • iSCSI Stability: The logs confirm that as long as 2 paths remain active, vSAN or iSCSI storage remains consistent. The primary impact of this event was the Management Plane and VM Network Plane.