VMs goes inaccessible state after vMotion on a specific ESXi host.
search cancel

VMs goes inaccessible state after vMotion on a specific ESXi host.

book

Article ID: 422660

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • VMs may be in a hung state or an inaccessible state on vCenter.
  • VMNIC or the uplink used for VM traffic may be configured through lag.
  • VMs and datastores may report heartbeat timeouts.
  • NIC reports IO aborts.
  • ESXi goes non-responsive intermittently.

Verification: 

  • Ensure the driver and firmware versions on VMNIC are compatible with the Broadcom compatibility matrix. The hardware vendor can be contacted to check if there are any driver and firmware-related issues.
  • On ESXi host: /var/run/log/hostd.log reports the VM has entered an invalid power state.

YYYY-MM-DDThh:mm:ss.msZ Wa(164) Hostd[2104923]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-#######-####-############/VMname/VMname.vmx opID=m6snrov8-######-auto-mmij-h5:70211061-19-01-13-aa78 sid=52341a88 user=vpxuser:VSPHERE.LOCAL\Administrator] Query VMX about hlstate failed Fault cause: vim.fault.InvalidPowerState

Environment

  • VMware vSphere ESXi 7.x
  • VMware vSphere ESXi 8.x

Cause

  • This is a known issue in the nmlx5 health mechanism logic, where the VMNIC driver incorrectly detects NIC is in a faulty state.

Causes Validation :

  • On ESXi host: /var/run/log/vmkernel.log reports that the IO was aborted and failed to set the L2 table entry.

YYYY-MM-DDThh:mm:ss.msZ Al(177) vmkalert: cpu64:2097709)ALERT: <NMLX_ERR> nmlx5_SetL2TableEntryCmd:170 command failed: IO was aborted
YYYY-MM-DDThh:mm:ss.msZ In(182) vmkernel: cpu64:2097709)<NMLX_ERR> nmlx5_core: vmnic0: nmlx5_en_L2TableIndexAdd - (nmlx5_core_en_main.c:8810) Failed to set L2 table entry (195887328)
YYYY-MM-DDThh:mm:ss.msZ In(182) vmkernel: cpu64:2097709)<NMLX_ERR> nmlx5_core: vmnic0: nmlx5_en_RxQueueFiltersDbApply - (nmlx5_core_en_multiq.c:1631) Failed to add filter into L2 (Failure)
YYYY-MM-DDThh:mm:ss.msZ In(182) vmkernel: cpu64:2097709)<NMLX_ERR> nmlx5_core: vmnic0: nmlx5_en_RxQueueFiltersDbApply - (nmlx5_core_en_multiq.c:1657) done  status: Failure
YYYY-MM-DDThh:mm:ss.msZ In(182) vmkernel: cpu64:2097709)<NMLX_ERR> nmlx5_core: vmnic0: nmlx5_en_UplinkQApplyFilter - (nmlx5_core_en_multiq.c:1839) nmlx5_en_RxQueueFiltersDbApply failed - Failure
YYYY-MM-DDThh:mm:ss.msZ In(182) vmkernel: cpu64:2097709)<NMLX_ERR> nmlx5_core: vmnic0: nmlx5_en_UplinkQApplyFilter - (nmlx5_core_en_multiq.c:1879) done  status: Failure
YYYY-MM-DDThh:mm:ss.msZ In(182) vmkernel: cpu64:2097709)<NMLX_INF> nmlx5_core: vmnic0: nmlx5_en_L2TableIndexAdd - (nmlx5_core_en_main.c:8779) Add ##:##:##:##:##:## to L2 table
YYYY-MM-DDThh:mm:ss.msZ Al(177) vmkalert: cpu64:2097709)ALERT: <NMLX_ERR> nmlx5_SetL2TableEntryCmd:170 command failed: IO was aborted

 

Resolution

Workaround:

  • Currently, there is no other workaround to fix the issue; however, rebooting the ESXi host is needed to recover the uplink. 

Additional Information

Reference article:

PSOD: nmlx5_QueryNicVportContext:188 command failed: IO was aborted

During vSphere HA testing, some vSAN objects report "inaccessible" marking the VM down.