VMware-updatemgr service fails to start with an error "Service start operation timed out"

Products

VMware vCenter Server 7.0

Issue/Introduction

Symptoms:

VMware-updatemgr service will be in a stopped state
Starting the service manually using service-control --start vmware-updatemgr command fails with a timeout error after 12 minutes
In the var/log/vmware/vmon/vmon.log, you may find entries similar to:

YYYY-MM-DDTHH:MM:SS.522Z In(05) host-XXXX <updatemgr> Constructed command: /usr/lib/vmware-updatemgr/bin/vmware-updatemgr /usr/lib/vmware-updatemgr/bin/vci-integrity.xml

YYYY-MM-DDTHH:MM:SS.944Z In(05) host-XXXX <updatemgr> Service start operation timed out.
YYYY-MM-DDTHH:MM:SS.944Z Wa(03) host-XXXX <updatemgr> Found empty StopSignal parameter in config file. Defaulting to SIGTERM
YYYY-MM-DDTHH:MM:SS.945Z Wa(03) host-XXXX <updatemgr> Sysprocess clean stop timed out. Force kill. Pid xxxxx
YYYY-MM-DDTHH:MM:SS.954Z Wa(03) host-XXXX <updatemgr> Service exited. Exit code 1

YYYY-MM-DDTHH:MM:SS.954Z Er(02) host-XXXX Service batch op START failed. Failed services: 'updatemgr'
YYYY-MM-DDTHH:MM:SS.954Z Er(02) host-XXXX Services updatemgr failed to start on vMon startup. ErrCode 1

In the var/log/vmware/vmware-updatemgr/vum-server/vmware-vum-server-*.log, you may find entries similar to:

YYYY-MM-DDTHH:MM:SS.692Z info vmware-vum-server[14151] [Originator@6876 sub=VcIntegrity] [vcIntegrity 914] Creating inventory.
YYYY-MM-DDTHH:MM:SS.692Z info vmware-vum-server[14151] [Originator@6876 sub=InventoryMonitor] [InventoryMonitor 108] Creating instance of InventoryMonitor
YYYY-MM-DDTHH:MM:SS.692Z verbose vmware-vum-server[16158] [Originator@6876 sub=JobDispatcher] [JobDispatcher 420] The number of tasks: 0
YYYY-MM-DDTHH:MM:SS.830Z info vmware-vum-server[14151] [Originator@6876 sub=InventoryMonitor] [InventoryMonitor 2521] The number of roles: 40
YYYY-MM-DDTHH:MM:SS.832Z info vmware-vum-server[14151] [Originator@6876 sub=InventoryMonitor] [InventoryMonitor 365] CreateFilter, bRetVal := 1 getResourcePoolUpdates := 0

YYYY-MM-DDTHH:MM:SS.692Z verbose vmware-vum-server[16158] [Originator@6876 sub=JobDispatcher] [JobDispatcher 420] The number of tasks: 0
YYYY-MM-DDTHH:MM:SS.692Z verbose vmware-vum-server[16158] [Originator@6876 sub=JobDispatcher] [JobDispatcher 420] The number of tasks: 0

YYYY-MM-DDTHH:MM:SS.116Z verbose vmware-vum-server[16158] [Originator@6876 sub=JobDispatcher] [JobDispatcher 420] The number of tasks: 0
YYYY-MM-DDTHH:MM:SS116Z verbose vmware-vum-server[16158] [Originator@6876 sub=JobDispatcher] [JobDispatcher 420] The number of tasks: 0
YYYY-MM-DDTHH:MM:SS.944Z info vmware-vum-server[57805] [Originator@6876 sub=Default] [shutdownMgr,88] Ufa shutdown manager stopping... first make sure this thread is enlisted.
YYYY-MM-DDTHH:MM:SS.944Z info vmware-vum-server[57805] [Originator@6876 sub=Default] [shutdownMgr,92] Ufa shutdown manager stopping... Total number of callbacks: 1

In the var/log/vmware/vpxd/vpxd.log, you may find entries similar to:

YYYY-MM-DDTHH:MM:SS.767Z info vpxd[06831] [Originator@6876 sub=vpxLro opID=xxxxxxxx] [VpxLRO] -- BEGIN lro-601 -- EventManager -- vim.event.EventManager.createCollector -- xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx(xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx)

YYYY-MM-DDTHH:MM:SS.117Z warning vpxd[06831] [Originator@6876 sub=Default opID=xxxxxxxx] [VdbStatement] SQL execution took too long: SELECT EVENT_ID, CHAIN_ID, EVENT_TYPE, EXTENDED_CLASS, CREATE_TIME, USERNAME, CATEGORY, VM_ID, VM_NAME, HOST_ID, HOST_NAME, COMPUTERESOURCE_ID, COMPUTERESOURCE_TYPE, COMPUTERESOURCE_NAME, DATACENTER_ID, DATACENTER_NAME, DATASTORE_ID, DATASTORE_NAME, NETWORK_ID, NETWORK_NAME, NETWORK_TYPE, DVS_ID, DVS_NAME, STORAGEPOD_ID, STORAGEPOD_NAME, CHANGE_TAG_ID FROM VPXV_EVENT_ALL WHERE (EVENT_TYPE IN (?,?,?,?,?,?,?,?,?,?)) AND EVENT_ID<? ORDER BY EVENT_ID DESC LIMIT 10;

YYYY-MM-DDTHH:MM:SS.117Z warning vpxd[06831] [Originator@6876 sub=Default opID=xxxxxxxx] [VdbStatement] Execution elapsed time: 1655928 ms
YYYY-MM-DDTHH:MM:SS.380Z info vpxd[06831] [Originator@6876 sub=vpxLro opID=xxxxxxx] [VpxLRO] -- FINISH lro-601

Environment

VMware vCenter Server 7.x
VMware vCenter Server 8.x

Cause

During update manager service startup, the workflow will query the vCenter database. When the database query takes more than 12 minutes to complete, the update manager start operation will timeout. Database queries will take longer when the size of events table in the database is huge.

It is observed that some third party solutions constantly connect to the vCenter inventory and generate a lot of login and logout events. These events will contribute to the events table growth

Resolution

To resolve this issue, delete the old events from the vCenter database by referring to Delete old tasks, events and statistics data in vCenter Server 7.x and 8.x

vCenter service(vmware-vpxd) should be stopped and database backup should be taken before executing the script.
The script in the above kb takes TaskMaxAgeInDays, EventMaxAgeInDays and StatMaxAgeInDays as inputs which define the number of days the data will be retained
As this issue is caused due to the size of events table, we can only delete the old events without deleting the tasks and stats
Here is an example command which will delete only the events older than 10 days and retain all the tasks and stats

/opt/vmware/vpostgres/current/bin/psql -U postgres -v TaskMaxAgeInDays=-1 -v EventMaxAgeInDays=10 -v StatMaxAgeInDays=-1 -d VCDB -t -q -f <download-path>/2110031_Postgres_task_event_stat_reset_event_sequence.sql

Note: In the above command TaskMaxAgeInDays=-1 and StatMaxAgeInDays=-1 means it will not delete any tasks or stats. EventMaxAgeInDays=10 means it will delete all events older than 10 days.