Identify what is making the large API queries causing vsanmgmtd to run out of memory.
Symptoms:
Incorrect health check results and generic vSAN warnings due to vsanmgmtd running out of memory.
In vmkernel.log we see the following messages:
cpu46:6972926)MemSchedAdmit: 477: uw.6972870 (34370589) extraMin/extraFromParent: 64/64, vsanperfsvc (2371) childEmin/eMinLimit: 38898/38912cpu46:6972926)MemSchedAdmit: 470: Admission failure in path: vsanperfsvc/python.6972870/uw.6972870In vsanmgmt.log we see:
2024-06-10T05:51:00.097Z info vsand[2102921] [opID=MainThread statsdaemon::_logDaemonMemoryStats] Daemon memory stats: eMin=208.768MB, eMinPeak=245.588MB, rMinPeak=245.844MB MEMORY PRESSURE
2024-06-10T06:21:01.959Z info vsand[2102921] [opID=MainThread statsdaemon::_logDaemonMemoryStats] Daemon memory stats: eMin=203.652MB, eMinPeak=245.588MB, rMinPeak=245.844MB MEMORY PRESSURE
VMware vSAN 6.x
VMware vSAN 7.x
Search for "queryCmmds" and "queryVsanStatistics" API calls in vpxa.log
grep -Ei 'queryCmmds|queryVsanStatistics' vpxa.log on host
Get the session ID for the API calls in vpxd-profiler.log in vCenter
grep -Ei 'queryCmmds|queryVsanStatistics' vpxd-profiler.log | grep -i threadstate
ThreadState/ThreadId/58192/State/Task::lro-62488727::ha-vsan-internal-system-2687::vim.host.VsanInternalSystem.queryVsanStatistics::52bd547c-a4c9-7d28-e6c1-fxxxxxxxxxx8(526cb8c5-bbf5-d8e8-0a82-1axxxxxxxxf)/State/RPC::vim.host.VsanInternalSystem:ha-vsan-internal-system::[host name]::vim.host.VsanInternalSystem.queryVsanStatistics
In this example 52bd547c-a4c9-7d28-e6c1-fxxxxxxxxx8 is the session ID
Get the IP address of the machine making the API calls
grep 52bd547c-a4c9-7d28-e6c1-fabb3fa83ad8 vpxd-profiler-389.log | grep -i 'Session/Id'
/SessionStats/SessionPool/Session/Id='52bd547c-a4c9-7d28-e6c1-fxxxxxxxxxx8'/Username='XXX'/ClientIP='10.xx.xx.163'
In the above example, machine making the API calls has the IP 10.xx.xx.163
Use the IP address to identify the Machine making the API calls and power off/disable it
Restart vsanmgmtd on all host in the cluster
/etc/init.d/vsanmgmtd restart
Retest vSAN health in vCenter