ESXi reports misleading information about high CPU usage that triggered "Critical Alarm"
search cancel

ESXi reports misleading information about high CPU usage that triggered "Critical Alarm"

book

Article ID: 412722

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

The ESXi host is very slow in responding to operations directed from the vCenter and the host UI.

On the ESXCLI window, commands ran to check the VMs processes, status of the management agents and logs will either be slow/unresponsive.

Commands to check performance esxtop or storage information runs well dmesg

Environment

VMware vSphere ESXi 7.x

Cause

The lun/datastore removed from the array might not be full detached leading the host searching for a lun that does not exist:
vmkernel.log:
YYYY-MM-DDTHH:MM:SS cpu#:#######)WARNING: NMP: nmp_PathDetermineFailure:####: Cmd (#x#a) PDL error (#x#/#x##/#x#) - path vmhba#:C#:T#:L# device naa.##ccf#####b###dc##############d# - triggering path failover

YYYY-MM-DDTHH:MM:SS cpu#:#######)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:173: Could not find relative target port ID for path "vmhba#:C#:T#:L#" - Not found (#########)
YYYY-MM-DDTHH:MM:SS cpu#:#######)WARNING: NMP: nmpCompleteRetryForPath:###: Retry cmd #x#a (#x##ba#f#a####) to dev "naa.##ccf#####b###dc##############d#" failed on path "vmhba#:C#:T#:L#" H:#x# D:#x# P:#x# Valid sense data: #x# #x## #x#.

Resolution

  1. Work with your storage vendor to check for any decommissioned LUN in the affected host using the KB: Permanent Device Loss (PDL) and All-Paths-Down (APD) on host. This would identify the LUN which is loss which could be decommissioned.
    Note: Check with your storage that the LUN is truly and surely Not In Use and marked to be decommission
    1. Ensure the problematic LUN is detached, using this KB: Detach a LUN device from ESXi hosts.
    2. If it fails to detach, RDP to all the VM in the affected host and power them down.
    3. After all the VMs were powered down then perform a power cycle "cold boot" from the LOM (iLO/iDRAC/IMM/etc) of the affected ESXi host.
    4. When the host is rebooted, attempt to detach the LUN if the LUN is still present, using this KB: Detach a LUN device from ESXi hosts.

Additional Information

Prior to the reboot, ignored the progress from the ESXi (inventory) page of the vCenter or host UI as the operations would be inaccurate at the time.