Excessive growth of vCenter Server Database caused by repeated dcui login / logout events which make esxcli calls

Products

VMware vSphere ESXi

Issue/Introduction

The vCenter Server Database may grow quickly in size or run out of space.

The space usage will be seen in /storage/seat, and could cause the vpxd service to be stopped due to insufficient space.

There will be mentions of esxcli commands being run by the dcui user - this simply indicates a script being run on the ESXi host(s) is making a call to the esxcli command.

Cause

This is caused by excessive login / logout events which fill the Events/Tasks tables. Note that these events are expected to be found, but not in such a large quantity that it causes excessive database growth.

The event logged will be either:

User [email protected] logged in
User [email protected] logged out

Reviewing the hostd.log on an affected ESXi host will show lines similar to the following:

2024-08-11T21:34:39.468Z info hostd[2100978] [Originator@6876 sub=Default opID=esxcli-f8-5682] Accepted password for user dcui from 127.0.0.1
2024-08-11T21:34:39.468Z warning hostd[2100978] [Originator@6876 sub=Vimsvc opID=esxcli-f8-5682] Refresh function is not configured.User data can't be added to scheduler.User name: dcui
2024-08-11T21:34:39.468Z info hostd[2100978] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=esxcli-f8-5682] Event 1255140 : User [email protected] logged in as pyvmomi Python/3.8.18 (VMkernel; 7.0.3; x86_64)
2024-08-11T21:34:39.537Z info hostd[2103324] [Originator@6876 sub=Solo.VmwareCLI opID=esxcli-f8-568a user=dcui] Dispatch list
2024-08-11T21:34:39.539Z info hostd[2103324] [Originator@6876 sub=Solo.VmwareCLI opID=esxcli-f8-568a user=dcui] Dispatch list done
2024-08-11T21:34:39.542Z info hostd[2100257] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=esxcli-f8-568b user=dcui] Event 1255141 : User [email protected] logged out (login time: Sunday, 11 August, 2024 09:34:39 PM, number of API invocations: 7, user agent: pyvmomi Python/3.8.18 (VMkernel; 7.0.3; x86_64))

Resolution

In order to determine if this issue is due to something running an esxcli script, you can run two commands to compare the output - if the number of occurrences is very close to the same in both outputs, there is something triggering esxcli to run many times.

On an affected ESXi host, change to the /var/run/log directory and run the following commands to search the hostd.log for events and count the number of lines which match:

grep "Accepted password for user dcui" hostd* | wc -l
92082

grep "Accepted password for user dcui" hostd* | grep esxcli | wc -l
91897

If the issue is caused by esxcli usage, the count of lines will be very similar as you can see above.

If there are not many lines in the second command's output, the cause will not be esxcli commands and you will need to investigate in the hostd.log file to see if there is another reason. The resolution here is specific to looking for esxcli commands being run as the cause, but it could also be used to locate a script that is doing some other call.

Typically, the script making these calls is installed by a 3rd Party VIB. Run the following command to see all VIB's which are not VMware Certified:

esxcli software vib list | grep -v VMwareCertified

This will show an output similar to:

Name Version Vendor Acceptance Level Install Date
----------------------------- ------------------------------------ ------ ---------------- ------------
ucs-tool-esxi 1.2.2-1OEM CIS PartnerSupported 2022-07-12
RP-Splitter RPESX-00.5.3.4.1.0.m.184.000 EMC PartnerSupported 2024-07-17

You will need to investigate the VIB's to determine if they are needed or not. Remove any unnecessary VIB's, one at a time to identify if one of them was the cause.

If the VIB's are required, there are a few items to check. First, check the cron jobs to see if a custom script has been added that could cause this issue. You can locate the file here:

/var/spool/cron/crontabs/root

The following is a default file for an 8.x host:

#min hour day mon dow command
1 1 * * * /sbin/tmpwatch.py
*/5 * * * * python ++group=host/vim/vmvisor/systemStorage,securitydom=systemStorageMonitorDom /sbin/systemStorageMonitor.pyc
1 * * * * /sbin/auto-backup.sh ++group=host/vim/vmvisor/auto-backup.sh
0 * * * * /usr/lib/vmware/vmksummary/log-heartbeat.py
*/5 * * * * /bin/hostd-probe.sh ++group=host/vim/vmvisor/hostd-probe/stats/sh,securitydom=hostdProbeDom
00 1 * * * localcli ++securitydom=storageDevicePurgeDom storage core device purge
0 */6 * * * /bin/pam_tally2 --reset
*/10 * * * * /bin/crx-cli ++securitydom=crxCliGcDom gc

Look for any custom python or bash scripts that are set to run frequently.

You can also use a simple command to generate a log with the output of the ps command over time. Run the following for enough time to capture the event in question. You should be able to see how often the command is run in the hostd.log file.

while true; do ps -CcJ >> /tmp/ps_CcJ.txt; sleep 1; done

Inspect the ps_CcJ.txt file to see if you can locate the esxcli command which is run repeatedly. In this case, there was a command being run multiple times very quickly:

2914887 2914887 sh /usr/bin/sh /opt/emc/rp/kdriver/bin/rp_rpa_discovery.sh --scan-props
2914890 2914890 sh /usr/bin/sh /opt/emc/rp/kdriver/bin/rp_rpa_discovery.sh --scan-props
2914893 2914893 awk awk -F String Value: {print $2}
2914892 2914892 grep grep String Value:
2914891 2914891 python python /bin/esxcli system settings advanced list -o /UserVars/RP_IP_Discovery_1

The offending script in this case was the "rp_rpa_discovery.sh" script. If you suspect a script, check to make sure that the PID is changing each time it is run to be sure it's being run multiple times and it isn't just a longer running script, or perhaps that a script got caught in a loop.

Once you have identified the script and/or VIB that appears to be causing the issue you can validate this by removing the script/VIB and verifying that the activity stops. If the cause is from a 3rd Party script/VIB, you will need to reach out to that 3rd Party Vendor to get assistance with resolving the issue.