During vSphere HA testing, some vSAN objects report "inaccessible" marking the VM down.
search cancel

During vSphere HA testing, some vSAN objects report "inaccessible" marking the VM down.

book

Article ID: 392429

calendar_today

Updated On:

Products

VMware vSAN VMware vSphere ESX 8.x

Issue/Introduction

Symptoms:

  • VM objects reported in inaccessible.
  • VMNIC or the uplink used for vSAN traffic may have the configured through lag.
  • Heartbeat timeouts.
  • NIC reports IO aborts.

1. You may see "IO was aborted" errors in /var/run/log/vmkernel.log

YYYY-MM-DD:T:HH:MM:SS Al(177) vmkalert: cpu##:#######)ALERT: nmlx5_SetL2TableEntryCmd:170 command failed: IO was aborted
YYYY-MM-DD:T:HH:MM:SS In(182) vmkernel: cpu##:#######) nmlx5_core: vmnic#: nmlx5_en_L2TableIndexAdd - (nmlx5_core_en_main.c:8810) Failed to set L2 table entry (#########)
YYYY-MM-DD:T:HH:MM:SS In(182) vmkernel: cpu##:#######) nmlx5_core: vmnic#: nmlx5_en_RxQueueFiltersDbApply -(nmlx5_core_en_multiq.c:1631) Failed to add filter into L2 (Failure)
YYYY-MM-DD:T:HH:MM:SS In(182) vmkernel: cpu##:#######) nmlx5_core: vmnic#: nmlx5_en_RxQueueFiltersDbApply - (nmlx5_core_en_multiq.c:1657) donestatus: Failure
YYYY-MM-DD:T:HH:MM:SS In(182) vmkernel: cpu##:#######) nmlx5_core: vmnic#: nmlx5_en_UplinkQApplyFilter - (nmlx5_core_en_multiq.c:1839) nmlx5_en_RxQueueFiltersDbApply failed - Failure
 
 
2. You may observe uplink issues with "lag1: not found" messages in var/run/log/vmkernel.log
 
YYYY-MM-DD:T:HH:MM:SS In(182) vmkernel: cpu##:####### opID=#######)Uplink: 2703: lag1: not found
YYYY-MM-DD:T:HH:MM:SS In(182) vmkernel: cpu##:####### opID=#######)Uplink: 2703: lag1: not found
 
3. Hearbeat timeouts
 
YYYY-MM-DD:T:HH:MM:SS In(14) vobd[#######]:  [vmfsCorrelator] 720696188us: [vob.vmfs.heartbeat.timedout]########-####-####-############  ########-####-####-############ 
YYYY-MM-DD:T:HH:MM:SS In(14) vobd[#######]:  [vmfsCorrelator] 720696053us: [esx.problem.vmfs.heartbeat.timedout] ########-####-####-############ ########-####-####-############ 
 

Environment

VMware vSAN 7.x

VMware vSAN 8.x
ESXI 8.0

Cause

This is known bug in the nmlx5 health mechanism logic where the driver incorrectly detects NIC is in faulty state.

Resolution

This issue is been resolved in ESXi 8.0 Update 3e build 24674464. Refer to  PSOD: nmlx5_QueryNicVportContext:188 command failed: IO was aborted

As a workaround , implement the steps below

  • Engage the network team/vendor to investigate any underlying uplink issues. 
  • Make sure the driver and firmware version on VMNIC are compatible as per Broadcom compatibility matrix. Hardware vendor can be contacted to check if there are any driver and firmware related issues.