Resolving VM Metrics Flow Issues After 2.1.1 Upgrade

search cancel

Resolving VM Metrics Flow Issues After 2.1.1 Upgrade

book

Article ID: 376402

calendar_today

Updated On: 09-11-2024

Products

VMware Data Services Manager

Issue/Introduction

This guide provides instructions for resolving the issue of VM metrics are not being reported after upgrading to DSM version 2.1.1.

Symptoms

VM metrics (such as CPU, Memory, and Data Disk usage) appear empty in the monitoring dashboard.
The APMS service fails to start.

Navigate to Databases → Select any database → Monitoring → Metrics panel. The charts for CPU Usage, Memory Usage, and Data Disk Usage will be empty.

When you SSH to the "Provider VM" and look at the "apms.service.log" (/var/log/tdm/provider/apms.log ) you see messages like:

failed to find parent tuple for heap-only tuple at <ctid>.

Environment

VCF and Data Services Manager 2.1.1

Cause

The Provider PostgreSQL database was corrupted.

Resolution

To resolve this known issue proceed with the following steps:

SSH into provider VM and check the /var/log/tdm/provider/apms.log to find ctid

Run the following commands. The time required depends on the number of rows in the vmware.vm_timeseries table:

psql -d vmware -U postgres -c "begin;delete from vmware.vm_timeseries where ctid = '<ctid>';end;"
psql -d vmware -U postgres -c "VACUUM FULL vmware.vm_timeseries"
psql -d vmware -U postgres -c "REINDEX TABLE vmware.vm_timeseries"
systemctl restart apms.service

Verify whether the apms.service has started successfully. If the error recurs with a different ctid, repeat the above commands using the new ctid.

Once the apms.service starts, VM metrics will resume, and the monitoring dashboard will display the metrics after some time.

Feedback

thumb_up Yes

thumb_down No