Troubleshooting requests not executing, High CPU usage, and the "Failed to fetch system information message" in VMware Aria Suite Lifecycle
search cancel

Troubleshooting requests not executing, High CPU usage, and the "Failed to fetch system information message" in VMware Aria Suite Lifecycle

book

Article ID: 322697

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:
  • Attempts to access System Details under Home > Settings > System Details fails with message:
    "Failed to fetch system information"
  • Other tasks in UI such as log bundle creation and basic day 2 operations do not proceed.
  • The CPU usage is high with top command identifying the postgres service as a high consumer.


Environment

VMware vRealize Suite Lifecycle Manager 8.x
VMware Aria Suite Lifecycle 8.x

Cause

  • This issue can occur when there is insufficient storage available for the postgres database or a large number of requests become queued and overload the requests engine.

Resolution

  • Prerequisites

  • You have valid backups or recent snapshots of the Aria Suite Lifecycle appliance.
  • You have access to ssh and the root username and password.
  • Procedure

  1. Validate that the Aria Suite Lifecycle postgres DB has sufficient space under the db partition /storageIf the disk space is low increase the disk size in vCenter and reboot the Aria Lifecycle appliance. To validate the space login to Aria Lifecycle appliance via ssh and run:
    df -h
  2. Stop the engine service
    systemctl stop vrlcm-server
  3. After validating storage check the database for a large number of stuck requests. To connect to Aria Lifecycle appliance database:
    /opt/vmware/vpostgres/11/bin/psql -U postgres -d vrlcm
  4. To query for in-progress requests:
    select count(*) from vm_rs_request where requestname='lcmgenricsetting';
    select count(*) from vm_engine_execution_request where enginestatus='INITIATED';
    select count(*) from vm_engine_statemachine_instance where status='CREATED';
    select count(*) from vm_engine_event where status='IN_PROGRESS';
  5. If any of the above commands returns a count equal to or higher than 50 remove the stuck requests:
    delete from vm_engine_scheduledrequest where targetid = 'lemansusagescheduler' or requestdata like '%licenseusagechedules%' or requestdata like '%licenseusageperiodic%';

    delete from vm_engine_event where statemachineinstance in ( select vmid from vm_engine_statemachine_instance where sourceoftherequest in (select executionid from vm_rs_request where state='COMPLETED' and executionid is not null));
    delete from vm_engine_statemachine_instance where sourceoftherequest in (select executionid from vm_rs_request where state='COMPLETED' and executionid is not null);

    delete from vm_engine_execution_request where sourceoftherequest in (select executionid from vm_rs_request where state='COMPLETED' and executionid is not null);

    delete from vm_engine_user_request where vmid in (select executionid from vm_rs_request where state='COMPLETED');
    delete from vm_engine_event where statemachineinstance in (select vmid from vm_engine_statemachine_instance where sourceoftherequest in (select vmid from vm_engine_user_request where createdon is not null and createdon < extract(epoch from (now() - interval '1 days'))*1000));

    delete from vm_engine_statemachine_instance where sourceoftherequest in (select vmid from vm_engine_user_request where createdon is not null and createdon < extract(epoch from (now() - interval '1 days'))*1000);

    delete from vm_engine_execution_request where sourceoftherequest in(select vmid from vm_engine_user_request where createdon is not null and createdon < extract(epoch from (now() - interval '1 days'))*1000);

    delete from vm_engine_user_request where createdon is not null and createdon < extract(epoch from (now() - interval '1 days'))*1000;

    delete from vm_rs_request where requesttype = 'lcmgenricsetting' and createdon is not null and createdon < (extract(epoch from (now())) - 300)*1000 and executionid is not null;

    VACUUM FULL verbose analyze vm_engine_event, vm_engine_statemachine_instance, vm_engine_execution_request,  vm_engine_user_request, vm_rs_request;
  6. If a significantly large number of requests have been deleted, vacuum the relevant table:
VACUUM FULL verbose analyze;
  1. ​​​​Exit the DB.
\q
  1. Start the engine service.
systemctl start vrlcm-server