Recurring gaps in Monitoring Chart for VM historic metric data
search cancel

Recurring gaps in Monitoring Chart for VM historic metric data

book

Article ID: 400181

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

  • Monitoring VMs in the Cloud Director Tenant Portal using the Monitoring Chart in an environment configured with a Cassandra metrics database.
  • There are gaps in the monitoring charts for periods of 2-10 minutes where no data is available, for example when viewing the Metric "cpu.usage.average" and Period "Day".
  • The StatsFeederCollectorJob takes greater than 5 minutes to complete and shows a Previous Start Time and Next Start Time greater than 10 minutes when querying the scheduled jobs on the Cells using the Cell Management Tool. For example the command and output below shows a 15 minutes difference between the previous and next start times:

/opt/vmware/vcloud-director/bin/cell-management-tool cell -i $(service vmware-vcd pid cell) -tt | grep "Currently\|+-\|Previous\|StatsFeederCollectorJob"


Currently active scheduled jobs:
+--------------------------------------+-------------------------+-------------------------+-----------------+---------------------------------------------+
| UUID                                 | Previous Start Time     | Next Start Time         | Status          | Job Name                                    |
+--------------------------------------+-------------------------+-------------------------+-----------------+---------------------------------------------+
| ########-####-####-####-############ | 2025-06-05 12:00:00.000 | 2025-06-05 12:15:00.000 | STATUS_QUEUED   | StatsFeederCollectorJob                     |
+--------------------------------------+-------------------------+-------------------------+-----------------+---------------------------------------------+

 

Environment

VMware Cloud Director 10.5.x

Cause

This issue occurs if Cloud Director cannot complete the metrics collection for VMs in a time frame that overlaps with the previous collection leading to a time gap between the last previously collected metric data and the latest set.

Resolution

To resolve this issue ensure that the Cloud Director instance is correctly sized, for Appliance Cells follow the recommendations of the documentation on VMware Cloud Director Appliance Sizing Guidelines.

Also ensure that the vCenter Server from which Cloud Director is retrieving the VM metrics data is not experiencing any resource constraints.

To workaround the issue Cloud Director can be configured to retrieve more metric data for a larger time period on each run of the StatsFeederCollectorJob by increasing the maximum sample size:

  1. Before making changes to the configuration, back up the Cloud Director as per the documentation.
  2. SSH to and of the Cloud Director Cells as root.
  3. Increase the maximum sample size to extend the time period covered by each run of the StatsFeederCollectorJob.
    The default value is 30 samples and could be increased in increments until the issue is resolved, for example 50 samples.
    This custom configuration can be set using the Cell Management Tool:

    /opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n statsFeeder.metrics.max.samples -v 50

    NOTE: To reset Cloud Director to the default, either set the value to 30 again using the command above, or use the delete option below:

    /opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n statsFeeder.metrics.max.samples -d

  4. Stop and start the Cloud Director service on all Cells for them to implement the configuration change:

    Stop the service:

    /opt/vmware/vcloud-director/bin/cell-management-tool cell -i $(service vmware-vcd pid cell) -s

    or

    systemctl stop vmware-vcd

    Start the service:

    systemctl start vmware-vcd

  5. Allow the StatsFeederCollectorJob to run for multiple iterations over 30 to 40 minutes and confirm whether the issue is resolved.
    NOTE: If the gaps shrink but are not completely removed then the maximum sample size can be increased even further.

 

Additional Information