ESXi host experienced high latency when communication with storage over iSCSI
search cancel

ESXi host experienced high latency when communication with storage over iSCSI

book

Article ID: 396689

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

VMs and host are showing high levels of storage latency when viewing VM performance and/or the datastore is losing connection
Software iSCSI is in use
Nimble storage array is not showing multi pathing 


 

Environment

Software iSCSI

Nimble storage 

ESXi (all) 

Cause

Based on Broadcom Compatibility Guide NIC drivers are out of date and the software scsi connection to the Nimble storage array does not have multi-pathing enabled 

In /var/log/vmkernel.log you may see the following type of issues:

ScsiDeviceIO: 4163: Cmd(0x45e9537e3388) 0x89, cmdId.initiator=0x430875e06180 CmdSN 0x7112a from world 2097257 to dev "eui.################################" failed H:0x5 D:0x0 P:0x0 Cancelled from driver layer



WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "eui.################################" state in doubt; requested fast path state update...
WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic:476: vmhba64:CH:# T:# CN:#: Failed to receive data: Connection closed by peer
WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic:484: Sess [ISID: ########### TARGET: iqn.####-##.com.nimblestorage:
WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic:485: Conn [CID: 0 L: ###.##.#.##:##### R: ###.##.#.###:####]
iscsi_vmk: iscsivmk_ConnRxNotifyFailure:1236: vmhba64:CH:# T:# CN:#: Connection rx notifying failure: Failed to Receive. State=Bound
iscsi_vmk: iscsivmk_ConnRxNotifyFailure:1237: Sess [ISID: ########### TARGET: iqn.####-##.com.nimblestorage:
iscsi_vmk: iscsivmk_ConnRxNotifyFailure:1238: Conn [CID: 0 L: ###.##.#.##:##### R: ###.##.#.###:####]
WARNING: iscsi_vmk: iscsivmk_StopConnection:738: vmhba64:CH:# T:# CN:#: iSCSI connection is being marked "OFFLINE" (Event:4)
WARNING: iscsi_vmk: iscsivmk_StopConnection:739: Sess [ISID: ########### TARGET: iqn.####-##.com.nimblestorage:
WARNING: iscsi_vmk: iscsivmk_StopConnection:740: Conn [CID: 0 L: ###.##.#.##:##### R: ###.##.#.###:####]
iscsi_vmk: iscsivmk_ConnNetRegister:1898: socket 0x############ network resource pool netsched.pools.persist.iscsi associated
iscsi_vmk: iscsivmk_ConnNetRegister:1926: socket 0x############ network tracker id ######### tracker.iSCSI.###.##.#.### associated
WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:160: Could not get page 83 INQUIRY data for path "vmhba64:C#:T#:L#" - Transient storage condition, suggest retry (195887294)
StorageDevice: 7060: End path evaluation for device eui.################################




bnxtnet: dev_dump_fw_trace:125: [vmnic3 : 0x############] Dumping FW Trace:

bnxtnet: dev_print_fw_data:95: [vmnic3 : 0x############] == START OF TRACE ==


6160.7:D:handle_comm_task_switch_to_vaux() pcie_rst_up true
6160.7:D:Vaux:1(0)
6160.7:D:KONG HALTED
6177.9:D:IPC_VMAIN_POR_DEASSERT
6177.9:D:handle_comm_task_power_pcie_event: in_vmain=1, VMAIN_INT
6177.9:D:handle_comm_task_switch_to_vmain: switch_to_vmain, pcie_rst_up:1
6179.8:D:IPC_PERST_DEASSERT
6179.8:D:PCIE rst (1)
6179.8:D:nvm_dir_find_entry:WARNING!!! NVM access without lock
6180.4:D:==========
6180.4:D:
6180.4:D:*** Release FW 231.0.153.0 built by ccxswbuild on Aug 23 2024 at 14:22:33 ***
6180.4:D:nvm_dir_find_entry:WARNING!!! NVM access without lock
6180.4:D:Debug command polling period 100ms
6180.4:D:Task monitor started with 500ms period
6180.4:D:Warm reset
6180.4:D:Selected qos profile MP12_WhPlus_2p_00 at index 0, offset 1c with port count 2
6180.4:D:Init Phase 2 done!!
6180.4:D:hw_set_pci_regs: pf 0 ven_id 0x14e4, dev_id 0x16d6 configured true vee false
6180.4:D:PCIe lane width x8, core freq 375MHz
6180.4:D:hw_set_pci_regs: pf 1 ven_id 0x14e4, dev_id 0x16d6 configured true vee false
6180.4:D:PCIe lane width x8, core freq 375MHz
6180.4:D:Sriov config done for PF 0.
6180.4:D:num_vf 0, bar0 size 0x3, bar2 0x900, bar4 0x3
6180.4:D:vf dev_id 0x16dc, msix tbl size 0x4a, vf msix vectors 0x0
6180.4:D:Sriov config done for PF 1.
6180.4:D:num_vf 0, bar0 size 0x3, bar2 0x900, bar4 0x3
6180.4:D:vf dev_id 0x16dc, msix tbl size 0x4a, vf msix vectors 0x0
6180.4:D:health_check is enabled in NVM
6180.5:D:hw_ocp_thermal_init: enable: 1, thresholds: 105/100/80 gpio 14/13/15
6180.5:D:kong start 270
6180.5:D:Enable PCIe VDM on PF0
6180.5:D:Chimp Livepatch not installed to NVM.
6181.0:D:hw_ocp_eval_temp: temp: 55 set mon_type Critical to high
6181.0:D:hw_ocp_eval_temp: temp: 55 set mon_type Warning to high
6181.0:D:hw_ocp_eval_temp: temp: 55 set mon_type Fan to low
6181.0:D:APE ready 1208
6181.0:D:c2a ipc chnl initialized
6181.2:D:Port0: pm probe.. Done
6215.2:D:IPC_PERST_ASSERT
6215.2:D:handle_comm_task_power_pcie_event: in_vmain=1, PERST_ASSERT
6215.2:D:Disable PCIe VDM on PF0
6216.9:D:IPC_PERST_DEASSERT
6216.9:D:PCIE rst (1)
6216.9:D:handle_comm_task_power_pcie_event: in_vmain=1, PERST_DEASSERT
6217.0:D:hw_set_pci_regs: pf 0 ven_id 0x14e4, dev_id 0x16d6 configured true vee false
6217.0:D:PCIe lane width x8, core freq 375MHz
6217.0:D:hw_set_pci_regs: pf 1 ven_id 0x14e4, dev_id 0x16d6 configured true vee false
6217.0:D:PCIe lane width x8, core freq 375MHz
6217.0:D:Sriov config done for PF 0.
6217.0:D:num_vf 0, bar0 size 0x3, bar2 0x900, bar4 0x3
6217.0:D:vf dev_id 0x16dc, msix tbl size 0x4a, vf msix vectors 0x0
6217.0:D:Sriov config done for PF 1.
6217.0:D:num_vf 0, bar0 size 0x3, bar2 0x900, bar4 0x3
6217.0:D:vf dev_id 0x16dc, msix tbl size 0x4a, vf msix vectors 0x0
6217.0:D:Selected qos profile MP12_WhPlus_2p_00 at index 0, offset 1c with port count 2
6217.0:D:Init Phase 2 done!!
6217.0:D:OLD & NEW pool sizes(0) are same skip update for fid 1
6217.0:D:OLD & NEW pool sizes(0) are same skip update for fid 2
6217.0:D:kong start 9136
6217.0:D:Enable PCIe VDM on PF0
6217.0:D:handle_comm_task_power_pcie_event: in_vmain=1, PERST_DEASSERT
6320.4:D:OLD & NEW pool sizes(0) are same skip update for fid 1
6320.5:W:DBG DRV:Warn: lm_hwrm_nvm_get_variable(950): failed, error_code = 2
6337.1:W:DBG DRV:Warn: lm_hwrm_nvm_get_variable(950): failed, error_code = 2
6610.4:W:DBG DRV:Warn: lm_hwrm_nvm_get_variable(950): failed, error_code = 2
6627.1:W:DBG DRV:Warn: lm_hwrm_nvm_get_variable(950): failed, error_code = 2
8720.1:W:DBG DRV:Warn: lm_hwrm_nvm_get_variable(950): failed, error_code = 2
8736.8:W:DBG DRV:Warn: lm_hwrm_nvm_get_variable(950): failed, error_code = 2
9018.2:W:DBG DRV:Warn: lm_hwrm_nvm_get_variable(950): failed, error_code = 2
9034.8:W:DBG DRV:Warn: lm_hwrm_nvm_get_variable(950): failed, error_code = 2
392.6:W:DBG DRV:Warn: lm_hwrm_nvm_get_variable(950): failed, error_code = 2
409.2:W:DBG DRV:Warn: lm_hwrm_nvm_get_variable(950): failed, error_code = 2
2943.6:W:DBG DRV:Warn: lm_hwrm_nvm_get_variable(950): failed, error_code = 2
2944.2:W:DBG DRV:Warn: lm_hwrm_nvm_get_variable(950): failed, error_code = 2

bnxtnet: dev_print_fw_data:107: [vmnic# : 0x############] == END OF TRACE ==




Resolution

Review your NIC drivers and firmware by Determining Network/Storage firmware and driver version in ESXi and compare to  Broadcom Compatibility Guide and confirm they are on the newest version based off your hardware and ESXi version. 

Review your storage array vendors best practice on multi-pathing. In some cases the storage vendor will provide a plugin or driver to set up the multi-pathing. 
If needed open a support case with your storage vendor.