Slow boot after upgrading to ESXi 8.0 U3 - Timeout during iSCSI device discovery phase
search cancel

Slow boot after upgrading to ESXi 8.0 U3 - Timeout during iSCSI device discovery phase

book

Article ID: 385971

calendar_today

Updated On:

Products

VMware vSphere ESXi 8.0

Issue/Introduction

  • ESXi host takes a long time to boot getting stuck on the step "activating software-iscsi"
  • While trying to connect to esxi host client, you may see error message :  "No healthy upstream" 
  • In boot.log, file  you see the "msleep" messages for minutes between the last uplink is up and the first iSCSI target-associated message similar to:

2024-07-01T13:41:44.046Z cpu4:2100170)Tcpip: 3472: msleep returned 4
2024-07-01T13:41:59.049Z cpu4:2100170)Tcpip: 3472: msleep returned 4
2024-07-01T13:51:35.377Z cpu59:2100170)iscsi_vmk: iscsivmk_ConnNetRegister:1887: socket 0x432ab43334e0 network resource pool netsched.pools.persist.iscsi associated
2024-07-01T13:51:35.377Z cpu59:2100170)iscsi_vmk: iscsivmk_ConnNetRegister:1914: socket 0x432ab43334e0 network tracker id 523668334 tracker.iSCSI.<ip> associated
2024-07-01T13:51:35.377Z cpu59:2100170)iscsi_vmk: iscsivmk_ConnNetRegister:1887: socket 0x432ab3ca3300 network resource pool netsched.pools.persist.iscsi associated

  • In syslog.log file, you can see lines similar to:

2024-07-01T13:41:44.046Z Wa(28) iscsid[2100170]: connection failed for discovery (err = Interrupted system call)!
2024-07-01T13:41:44.047Z Er(27) iscsid[2100170]: connection to discovery address <iscsi_target_ip> failed
2024-07-01T13:41:44.047Z Er(27) iscsid[2100170]: connection login retries (reopen_max) 5 exceeded
2024-07-01T13:41:44.047Z Db(31) iscsid[2100170]: discovery_sendtargets::Completed discovery on IFACE default(iscsi_vmk) target addr=<iscsi_target_ip> :3260 transport=iscsi_vmk UniqueTgt=0 DuplicateTgt=0
2024-07-01T13:41:44.047Z Db(31) iscsid[2100170]: discovery_sendtargets::Running discovery on IFACE iscsi_vmk@vmk2(iscsi_vmk) target addr=<iscsi_target_ip>:3260 (drec.transport=iscsi_vmk)
2024-07-01T13:41:59.050Z Wa(28) iscsid[2100170]: connection failed for discovery (err = Interrupted system call)!
2024-07-01T13:41:59.050Z Er(27) iscsid[2100170]: connection to discovery address <iscsi_target_ip> failed

Environment

VMware vSphere ESXi 8.0 U3

Cause

  • Known issue found by ENG on iSCSI firewall ruleset which may be causing iSCSI connect failures.
  • During iSCSI device discovery phase, the connection to the target is resulting failure
  • This is resulting in iSCSI discovery timeout (timeout of 3 sec and 5 attempts) for each discovery attempt.
  • Host indicates error in vmk_connect() and hence iSCSI device discovery is held up during host bootup.

Note: Errors are not necessarily related to stale targets, all target was available, just connection issue found in the code

Resolution

Permanent fix:

The fix would be part of ESXi 9.0 and ESXi 8.0 P05.

Workaround 

  1. Add desired iSCSI targets statically.
  2. Remove all iSCSI dynamic discovery addresses.
  3. Keep iSCSI login timeout to default (5 seconds)

Example:

  1. esxcli iscsi adapter target portal list
  2. esxcli iscsi adapter discovery statictarget add -A vmhbax -a <IP> -n <TARGET>
  3. esxcli iscsi adapter discovery statictarget list

Additional Information