Esxi hosts goes into not responding state
search cancel

Esxi hosts goes into not responding state

book

Article ID: 391479

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Random Esxi hosts goes unresponsive or freezing unexpectedly
  • No impact on the virtual machines running on the affected hosts 
  • Restarting the hostd service won't resolve the issue
  • Rebooting the hosts will fix the issue temporarily 

Environment

VMware vSphere Esxi (various versions)

Cause

By default, the HBA driver (qlnativefc) has flow control enabled both at the host and target levels. If the workload reaches a certain threshold, the HBA driver automatically sets congestion on the fabric port, which results in I/Os being aborted on multiple services.

/var/log/vmkernel/log (Esxi)

YYYY-MM-DDTHH:MM:SS cpu98:2098415)qlnativefc: vmhba0(27:0.0): SCMR: Set Congestion for Host WWN 51:40:2e:c0:1x:1x:0x:8x

YYYY-MM-DDTHH:MM:SS cpu23:2098915)NMP: nmp_ThrottleLogForDevice:3861: Cmd 0x28 (0x45bb4f28e480, 0) to dev "naa.60060e80087ade00xxxxxxxxxx" on path
"vmhba0:C0:T1:L127" Failed:
YYYY-MM-DDTHH:MM:SS cpu23:2098915)NMP: nmp_ThrottleLogForDevice:3869: H:0xc D:0x0 P:0x0 . Act:NONE. cmdId.initiator=0x430a99689a40 CmdSN 0x12cd466
YYYY-MM-DDTHH:MM:SS cpu23:2098915)ScsiDeviceIO: 4277: Cmd(0x45bb4f28e480) 0x28, CmdSN 0x12cd466 from world 0 to dev "naa.60060e80087adexxxxxxxxxxx" failed H:0xc D:0x0 P:0x0

Resolution

Workaround

  • Disable the flow control on the HBA(qlnativefc) by running the below command 
     esxcfg-module -s "ql2x_scmr_flow_ctl_host=0, ql2x_scmr_flow_ctl_tgt=0" qlnativefc
  • Reboot the Esxi for the changes to take affect