The CPU utilization of the ESXi hosts in the cluster is intermittently reaching 99%.
search cancel

The CPU utilization of the ESXi hosts in the cluster is intermittently reaching 99%.

book

Article ID: 420058

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Several ESXi hosts in the cluster are experiencing 99% CPU utilization, and this occurs irregularly.

When the host CPU utilization reaches 99%, continuous entries of "lost access to volume" are observed in hostd.log.

YYYY-MM-DDTXX:XX:XX.XXX info hostd[2289849] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 589750 : Lost access to volume xxxxxxxx-xxxxxxxx-xxxx-xxxxxxxx (datastore_XXXXXXXX) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
YYYY-MM-DDTXX:XX:XX.XXX info hostd[2105810] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 589950 : Lost access to volume xxxxxxxx-xxxxxxxx-xxxx-xxxxxxxx (datastore_XXXXXXXX) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
YYYY-MM-DDTXX:XX:XX.XXX info hostd[2106633] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 590070 : Lost access to volume xxxxxxxx-xxxxxxxx-xxxx-xxxxxxxx (datastore_XXXXXXXX) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.

In vobd.log, It shows a storage connection timeout.

./vobd.all:YYYY-MM-DDTXX:XX:XX.XXX: [vmfsCorrelator] 15453888918384us: [vob.vmfs.heartbeat.timedout] xxxxxxxx-xxxxxxxx-xxxx-xxxxxxxx datastore_XXXXXXXX
./vobd.all:YYYY-MM-DDTXX:XX:XX.XXX: [vmfsCorrelator] 15453699982479us: [esx.problem.vmfs.heartbeat.timedout] xxxxxxxx-xxxxxxxx-xxxx-xxxxxxxx datastore_XXXXXXXX
./vobd.all:YYYY-MM-DDTXX:XX:XX.XXX: [vmfsCorrelator] 15459069918601us: [vob.vmfs.heartbeat.timedout] xxxxxxxx-xxxxxxxx-xxxx-xxxxxxxx datastore_XXXXXXXX

 

Environment

vSphere ESXi 8.x

Cause

The ESXi hosts in the current cluster are frequently and continuously experiencing "lost access to volume" issues, blocking host I/O until the heartbeat I/O can be completed. This causes the host CPU utilization to frequently reach 99%.

Resolution

Check the storage connection status.

Check if there are any I/O-intensive tasks running in the current environment, as these tasks may degrade I/O performance and cause heartbeat I/O timeouts, for example:
VM backup or desktop deployment/deletion operations.

 

The storage is no longer in use in the current environment. therefore, uninstall it from all ESXi hosts in the cluster. After uninstallation, the situation where CPU usage reaches 99% is no longer occurred.