vMotion of VMs freeze at 'Stuck at executing callback'.
Hosts' CPU usage shows 100%
ESXi 7.0.x
ESXi 8.0.x
This is a known issue observed when a LUN backing a VMFS datastore is unpresented from the storage array, without unmounting the datastore and detaching the LUN at ESXi.
The following is logged in vmkernel log####-##-##T##:##Z cpu7:7843992)WARNING: NMP: nmpDeviceAttemptFailover:647: Retry world failover device "naa.################################" - issuing command 0x4578ead888c8####-##-##T##:##Z cpu1:1049339)WARNING: NMP: nmpCompleteRetryForPath:364: Retry cmd 0x84 (0x4578ead888c8) to dev "naa.################################" failed on path "vmhba1:C0:T11:L21" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.####-##-##T##:##Z cpu1:1049339)WARNING: NMP: nmp_PathDetermineFailure:3527: Cmd (0x84) PDL error (0x5/0x25/0x0) - path vmhba1:C0:T11:L## device naa.################################ - triggering path failover####-##-##T##:##Z cpu1:1049339)WARNING: NMP: nmpCompleteRetryForPath:394: Logical device "naa.################################": awaiting fast path state update before retrying failed command again...####-##-##T##:##Z cpu26:1049338)NMP: nmp_ThrottleLogForDevice:3798: last error status from device eui.23f3bf87b61e8f8f6c9ce9002f5563f6 repeated 10240 times####-##-##T##:##Z cpu1:6592974)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:173: Could not find relative target port ID for path "vmhba1:C0:T15:L21" - Not found (195887107)####-##-##T##:##Z cpu1:6592974)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:173: Could not find relative target port ID for path "vmhba1:C0:T12:L21" - Not found (195887107)####-##-##T##:##Z cpu1:6592974)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:173: Could not find relative target port ID for path "vmhba2:C0:T12:L21" - Not found (195887107)####-##-##T##:##Z cpu1:6592974)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:173: Could not find relative target port ID for path "vmhba2:C0:T9:L21" - Not found (195887107)####-##-##T##:##Z cpu1:6592974)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:173: Could not find relative target port ID for path "vmhba1:C0:T11:L21" - Not found (195887107)####-##-##T##:##Z cpu1:6592974)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:173: Could not find relative target port ID for path "vmhba1:C0:T14:L21" - Not found (195887107)####-##-##T##:##Z cpu1:6592974)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:173: Could not find relative target port ID for path "vmhba2:C0:T15:L21" - Not found (195887107)####-##-##T##:##Z cpu1:6592974)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:173: Could not find relative target port ID for path "vmhba2:C0:T14:L21" - Not found (195887107)
Performing a rescan on all the hosts that the LUN was earlier presented to, will resolve the issue.
Note: Rescan can take a while to complete depending on the number of LUNs that are presented to the ESXi hosts.
This issue will be fixed in an upcoming release of ESXi 7.0.x.
Its resolved in ESXi 8.0 Update 2d (build 24585300) and later.
It is recommended to unmount the datastore and detach the LUN from ESXi hosts before unpresenting it from the array. Please follow How to detach a LUN device from ESXi hosts for instructions on how to unmount a datastore and detach the LUN from ESXi hosts.
If you encounter any issues, please reach out to Broadcom Support Team for assistance. Instructions on how to create a support case are available at Creating and managing Broadcom support cases