VPXD Service crashing frequently due to high memory usage

Products

VMware vCenter Server 7.0 VMware vCenter Server 8.0

Issue/Introduction

The vpxd service on the vCenter Server crashes frequently.

Checking the logs, see the following entries during the crash on the vpxd logs :

yyyy-mm-ddThh:mm:ss.mss+05:30 info vpxd[38215] [Originator@6876 sub=vpxLro opID=sps-Main-495954-988-815542-e3] [VpxLRO] -- FINISH session[52efd651-ab09-711f-e018-57905c837ea7]52b3e4b7-b9d8-d0cf-abf5-dd60becb74ec
yyyy-mm-ddThh:mm:ss.mss+05:30 error vpxd[38046] [Originator@6876 sub=Memory checker] Current value 4044496 exceeds hard limit 4042752. Shutting down process.
yyyy-mm-ddThh:mm:ss.mss+05:30 panic vpxd[38046] [Originator@6876 sub=Default]
    -->
    --> Panic: Memory exceeds hard limit. Panic
    --> Backtrace:
    --> [backtrace begin] product: VMware VirtualCenter, version: 7.0.3, build: build-20845200, tag: vpxd, cpu: x86_64, os: linux, buildType: release
    --> backtrace[00] libvmacore.so[0x0037DA77]
    --> backtrace[01] libvmacore.so[0x002C78D3]: Vmacore::System::Stacktrace::CaptureFullWork(unsigned int)
    --> backtrace[02] libvmacore.so[0x002D6B69]: Vmacore::System::SystemFactory::CreateBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)
    --> backtrace[03] libvmacore.so[0x00370BC3]
    --> backtrace[04] libvmacore.so[0x00370CDF]: Vmacore::PanicExit(char const*)
    --> backtrace[05] libvmacore.so[0x002C7735]: Vmacore::System::ResourceChecker::DoCheck()
    --> backtrace[06] libvmacore.so[0x0023B36A]
    --> backtrace[07] libvmacore.so[0x002349A7]
    --> backtrace[08] libvmacore.so[0x00239F4F]
    --> backtrace[09] libvmacore.so[0x003764AC]
    --> backtrace[10] libpthread.so.0[0x00007F87]
    --> backtrace[11] libc.so.6[0x000F362F]
    --> backtrace[12] (no module)
    --> [backtrace end]
yyyy-mm-ddThh:mm:ss.mss+05:30 - time the service was last started 2024-05-16T09:28:46.116+05:30, Section for VMware VirtualCenter, pid=8756, version=7.0.3, build=20845200, option=Release
yyyy-mm-ddThh:mm:ss.mss+05:30 info -[08756] [Originator@6876 sub=Default] Glibc malloc guards disabled.

On the vmon logs, see the following entries :

yyyy-mm-ddThh:mm:ss.mssZ Wa(03) host-2336 <vpxd> Service exited unexpectedly. Crash count 0. Taking configured recovery action.
yyyy-mm-ddThh:mm:ss.mssZ Wa(03) host-2336 <vpxd> Service exited unexpectedly. Crash count 1. Taking configured recovery action.
yyyy-mm-ddThh:mm:ss.mssZ Wa(03) host-2336 <vpxd> Service exited unexpectedly. Crash count 2. Taking configured recovery action.
yyyy-mm-ddThh:mm:ss.mssZ Wa(03) host-2336 <vpxd> Service exited unexpectedly. Crash count 2. Taking configured recovery action.

Environment

vCenter Server

Cause

This is caused due to numerous Container Views created but not destroyed. Generally these requests are from external monitoring tools.

Run the following command on the Log file: /var/log/vmware/vpxd/vpxd.log :

1. Access vCenter Server through SSH.

2. Run the below command to reach the log location.

# cd /var/log/vmware/vpxd
# grep vim.view.ViewManager.createContainerView vpxd.log | grep BEGIN | awk '{print$16}' | sort | uniq -c | sort -nr | head
# grep vim.view.View.destroy vpxd.log | grep BEGIN | awk '{print$16}' | sort | uniq -c | sort -nr | head

Example output :

$ grep vim.view.ViewManager.createContainerView vpxd-855.log | grep BEGIN | awk '{print$16}' | sort | uniq -c | sort -nr | head
   5402 528e3844-fedb-81e7-d16f-ed69d7afe36a(52ba26a6-8fb1-5dad-3ab4-ad46161c7a02)
    194 52e6b6a3-9264-3257-4e2a-02da91afa0dc(52cf38e2-c47a-103a-436a-7d32c51244a7)
     90 5233bef7-9062-1f7a-5ad5-1802d2628ff2(523a8aa5-b687-cf42-4214-d6f73aca773d)
      1 529fde8b-eba5-c1d8-7ef7-0f50422f4a6d(52580f52-7940-e0a5-0289-24032367def8)
      1 525962e8-c98a-e92c-b899-af4079c53c67(524d33c7-0d0c-4248-9072-3708f0c87f9d)
      1 52046db2-b84b-69cf-c869-23810176dace(52c4c168-aee2-cb53-5790-959947a97c9f)


$ grep vim.view.View.destroy vpxd-855.log | grep BEGIN | awk '{print$16}' | sort | uniq -c | sort -nr | head
    194 52e6b6a3-9264-3257-4e2a-02da91afa0dc(52cf38e2-c47a-103a-436a-7d32c51244a7)
     90 5233bef7-9062-1f7a-5ad5-1802d2628ff2(523a8aa5-b687-cf42-4214-d6f73aca773d)
      1 529fde8b-eba5-c1d8-7ef7-0f50422f4a6d(52580f52-7940-e0a5-0289-24032367def8)
      1 525962e8-c98a-e92c-b899-af4079c53c67(524d33c7-0d0c-4248-9072-3708f0c87f9d)
      1 52046db2-b84b-69cf-c869-23810176dace(52c4c168-aee2-cb53-5790-959947a97c9f)

Here, the session ID 528e3844-fedb-81e7-d16f-ed69d7afe36a appears to be creating these requests and causing the vpxd service to run out of memory eventually.

On checking the vpxd-profiler logs for this session ID, you will be able to identify the source of these requests.

find -iname "vpxd-profiler*" -type f -exec grep -H "528e3844-fedb-81e7-d16f-ed69d7afe36a" {} \; | grep "ClientIP" | head -n 5

./vpxd-profiler-421.log:--> /SessionStats/SessionPool/Session/Id='528e3844-fedb-81e7-d16f-ed69d7afe36a'/Username='VSPHERE.LOCAL\Administrator'/ClientIP='<IP-Client>'/SessionView/Container/total 233
./vpxd-profiler-421.log:--> /SessionStats/SessionPool/Session/Id='528e3844-fedb-81e7-d16f-ed69d7afe36a'/Username='VSPHERE.LOCAL\Administrator'/ClientIP='<IP-Client>'/PropertyCollector/ReadLocked/total 0
./vpxd-profiler-421.log:--> /SessionStats/SessionPool/Session/Id='528e3844-fedb-81e7-d16f-ed69d7afe36a'/Username='VSPHERE.LOCAL\Administrator'/ClientIP='<IP-Client>'/PropertyCollector/QueuedOpsCount/total 0
./vpxd-profiler-421.log:--> /SessionStats/SessionPool/Session/Id='528e3844-fedb-81e7-d16f-ed69d7afe36a'/Username='VSPHERE.LOCAL\Administrator'/ClientIP='<IP-Client>'/PropertyCollector/TriggeredFiltersCount/total 0
./vpxd-profiler-421.log:--> /SessionStats/SessionPool/Session/Id='528e3844-fedb-81e7-d16f-ed69d7afe36a'/Username='VSPHERE.LOCAL\Administrator'/ClientIP='<IP-Client>'/PropertyCollector/NullCollectorCount/total 0

Resolution

To resolve this issue, please reach out to your external monitoring tools Vendor (for example, Logic Monitor). The container views need to be destroyed after use.

Refer the Document below :
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/vcenter-api-perf-best-practices.pdf
"When creating a View object to monitor Inventory, you should destroy the View after use to avoid memory accumulation in the vCenter."