Excessive growth of vCenter Server Database caused by repeated dcui login / logout events which make esxcli calls
search cancel

Excessive growth of vCenter Server Database caused by repeated dcui login / logout events which make esxcli calls

book

Article ID: 375359

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

The vCenter Server Database may grow quickly in size or run out of space.

The space usage will be seen in /storage/seat, and could cause the vpxd service to be stopped due to insufficient space.

There will be mentions of esxcli commands being run by the dcui user - this simply indicates a script being run on the ESXi host(s) is making a call to the esxcli command.

Cause

This is caused by excessive login / logout events which fill the Events/Tasks tables.  Note that these events are expected to be found, but not in such a large quantity that it causes excessive database growth.

The event logged will be either:

Reviewing the hostd.log on an affected ESXi host will show lines similar to the following:

2024-08-11T21:34:39.468Z info hostd[2100978] [Originator@6876 sub=Default opID=esxcli-f8-5682] Accepted password for user dcui from 127.0.0.1
2024-08-11T21:34:39.468Z warning hostd[2100978] [Originator@6876 sub=Vimsvc opID=esxcli-f8-5682] Refresh function is not configured.User data can't be added to scheduler.User name: dcui
2024-08-11T21:34:39.468Z info hostd[2100978] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=esxcli-f8-5682] Event 1255140 : User [email protected] logged in as pyvmomi Python/3.8.18 (VMkernel; 7.0.3; x86_64)
2024-08-11T21:34:39.537Z info hostd[2103324] [Originator@6876 sub=Solo.VmwareCLI opID=esxcli-f8-568a user=dcui] Dispatch list
2024-08-11T21:34:39.539Z info hostd[2103324] [Originator@6876 sub=Solo.VmwareCLI opID=esxcli-f8-568a user=dcui] Dispatch list done
2024-08-11T21:34:39.542Z info hostd[2100257] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=esxcli-f8-568b user=dcui] Event 1255141 : User [email protected] logged out (login time: Sunday, 11 August, 2024 09:34:39 PM, number of API invocations: 7, user agent: pyvmomi Python/3.8.18 (VMkernel; 7.0.3; x86_64))

Resolution

In order to determine if this issue is due to something running an esxcli script, you can run two commands to compare the output - if the number of occurrences is very close to the same in both outputs, there is something triggering esxcli to run many times.

On an affected ESXi host, change to the /var/run/log directory and run the following commands to search the hostd.log for events and count the number of lines which match:

grep "Accepted password for user dcui" hostd* | wc -l
92082

grep "Accepted password for user dcui" hostd* | grep esxcli | wc -l
91897

If the issue is caused by esxcli usage, the count of lines will be very similar as you can see above.

If there are not many lines in the second command's output, the cause will not be esxcli commands and you will need to investigate in the hostd.log file to see if there is another reason.  The resolution here is specific to looking for esxcli commands being run as the cause, but it could also be used to locate a script that is doing some other call.

Typically, the script making these calls is installed by a 3rd Party VIB.  Run the following command to see all VIB's which are not VMware Certified:

esxcli software vib list | grep -v VMwareCertified

This will show an output similar to:

Name                           Version                               Vendor  Acceptance Level  Install Date
-----------------------------  ------------------------------------  ------  ----------------  ------------
ucs-tool-esxi                  1.2.2-1OEM                            CIS     PartnerSupported  2022-07-12
RP-Splitter                    RPESX-00.5.3.4.1.0.m.184.000          EMC     PartnerSupported  2024-07-17

You will need to investigate the VIB's to determine if they are needed or not.  Remove any unnecessary VIB's, one at a time to identify if one of them was the cause.

If the VIB's are required, there are a few items to check.  First, check the cron jobs to see if a custom script has been added that could cause this issue.  You can locate the file here:

/var/spool/cron/crontabs/root

The following is a default file for an 8.x host:

#min hour day mon dow command
1    1    *   *   *   /sbin/tmpwatch.py
*/5  *    *   *   *   python ++group=host/vim/vmvisor/systemStorage,securitydom=systemStorageMonitorDom /sbin/systemStorageMonitor.pyc
1    *    *   *   *   /sbin/auto-backup.sh ++group=host/vim/vmvisor/auto-backup.sh
0    *    *   *   *   /usr/lib/vmware/vmksummary/log-heartbeat.py
*/5  *    *   *   *   /bin/hostd-probe.sh ++group=host/vim/vmvisor/hostd-probe/stats/sh,securitydom=hostdProbeDom
00   1    *   *   *   localcli ++securitydom=storageDevicePurgeDom storage core device purge
0    */6  *   *   *   /bin/pam_tally2 --reset
*/10 *    *   *   *   /bin/crx-cli ++securitydom=crxCliGcDom gc

Look for any custom python or bash scripts that are set to run frequently.

You can also use a simple command to generate a log with the output of the ps command over time.  Run the following for enough time to capture the event in question.  You should be able to see how often the command is run in the hostd.log file.

while true; do ps -CcJ >> /tmp/ps_CcJ.txt; sleep 1; done

Inspect the ps_CcJ.txt file to see if you can locate the esxcli command which is run repeatedly.  In this case, there was a command being run multiple times very quickly:

2914887  2914887  sh                    /usr/bin/sh /opt/emc/rp/kdriver/bin/rp_rpa_discovery.sh --scan-props
2914890  2914890  sh                    /usr/bin/sh /opt/emc/rp/kdriver/bin/rp_rpa_discovery.sh --scan-props
2914893  2914893  awk                   awk -F String Value:  {print $2}
2914892  2914892  grep                  grep    String Value:
2914891  2914891  python                python /bin/esxcli system settings advanced list -o /UserVars/RP_IP_Discovery_1

The offending script in this case was the "rp_rpa_discovery.sh" script.  If you suspect a script, check to make sure that the PID is changing each time it is run to be sure it's being run multiple times and it isn't just a longer running script, or perhaps that a script got caught in a loop.

Once you have identified the script and/or VIB that appears to be causing the issue you can validate this by removing the script/VIB and verifying that the activity stops.  If the cause is from a 3rd Party script/VIB, you will need to reach out to that 3rd Party Vendor to get assistance with resolving the issue.