ESXi 7.0 host goes not responding in vCenter and hostd logs reporting "SetVigorNotificationTime: Not connected"
search cancel

ESXi 7.0 host goes not responding in vCenter and hostd logs reporting "SetVigorNotificationTime: Not connected"

book

Article ID: 318013

calendar_today

Updated On:

Products

VMware vSphere ESX 7.x

Issue/Introduction

Symptoms:
  • ESXi 7.0 hosts goes not responding in vCenter and recovers automatically few minutes later. 
  • ESXi host is seeing very high CPU consumption.
  • "esxtop' shows hostd process consuming high %RDY
  • Restarting hostd services brings to normal state but again it shows CPU high after sometime.
  • In var/log/hostd.log you see the hostd process crashing with panic:

2023-06-08T12:33:54.583Z error hostd[2101222] [Originator@6876 sub=UW Memory checker] Current value 2106136 exceeds hard limit 2105548. Shutting down process.

2023-06-08T12:33:54.583Z panic hostd[2101222] [Originator@6876 sub=Default]

-->

--> Panic: Memory exceeds hard limit. Panic

--> Backtrace:

--> [backtrace begin] product: VMware ESX, version: 7.0.3, build: build-20842708, tag: hostd, cpu: x86_64, os: esx, buildType: release

--> backtrace[00] libvmacore.so[0x001D6812]: Vmacore::System::Stacktrace::CaptureFullWork(unsigned int)

--> backtrace[01] libvmacore.so[0x001BD35D]: Vmacore::System::SystemFactory::CreateBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)

--> backtrace[02] libvmacore.so[0x003E64AB]

--> backtrace[03] libvmacore.so[0x003E6584]: Vmacore::PanicExit(char const*)

--> backtrace[04] libvmacore.so[0x001C5831]: Vmacore::System::ResourceChecker::DoCheck()

--> backtrace[05] libvmacore.so[0x0037EC70]

--> backtrace[06] libvmacore.so[0x002DC7CC]

--> backtrace[07] libvmacore.so[0x002E0354]

--> backtrace[08] libvmacore.so[0x003F1102]

--> backtrace[09] libpthread.so.0[0x00007D3B]

--> backtrace[10] libc.so.6[0x000ED16D]

--> backtrace[11] (no module)

--> [backtrace end]

NOTE: You may not notice "Memory exceeds hard limit" messages in every scenarios.

  • In var/log/hostd.log, you see a large number of rows similar to:

2023-06-08T12:20:08.112Z warning hostd[2342313] [Originator@6876 sub=VigorStatsProvider(000000fd581cd360).GuestStats(43328) opID=lro-33028250-21bfa8b-01-01-93-2992] SetVigorNotificationTime: Not connected

 

Cause

This issue is caused by race conditions between the power-off operation on the VM and the actual moment when the Vigor callbacks are invoked. As a result, the infinite loop can make hostd to consume memory by hard limit.

Resolution

This issue was fixed in VMware ESXi 7.0 Update 3o . To download go to Broadcom Support page 

Refer also to ESXi 7.0U3o release notes

"PR 3161690: CPU usage of ESXi hosts might intermittently increase in environments

Due to a rare race condition, when a VM power off command conflicts with a callback function, you might see increased CPU usage on ESXi hosts, for example, after an upgrade.
This issue is resolved in this release."