Apps Manager is slow when querying from Metric Store in Tanzu Application Service for VMs 2.9+

Apps Manager is slow when querying from Metric Store in Tanzu Application Service for VMs 2.9+

book

Article ID: 298120

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

Symptoms

  • Apps Manager will take a long time to load, especially when loading Key Metrics.
  • Some commands like restage could fail due to timeouts.

Affected versions

  • Tanzu Application Service for VMs (TAS for VMs) 2.9+
  • Note: The Metric Store tile needs to be installed as well.

Root cause

A new feature called Key Metrics was introduced for Apps Manager in TAS for VMs 2.9. For more information, refer to the following documentation: View Key Metrics in Apps Manager (Beta)

This feature makes a call to the Metric Store API to retrieve metrics to display them on the Apps Manager UI, and sometime it waits for a long time the Metric Store to response resulted in a slowness on rendering the page.

There are couple of reasons why the Metric Store can take a long to process the request:

1. Prior to Metric Store 1.5.1, there was no replication and metrics were only stored in 1 node for each source ID, and very query the Apps Manager makes has to check each Metric Store node until the metric was found.  For more information, see the article "Metric Store query failed, metrics may not display for one or more charts" error in Apps Metrics.   

Note that most customers have experienced improvements with the default replication factor of 2 nodes for a large foundation with a lot of calls to the Metric Store, but it creates additional network traffics between the nodes could cause slowness.

2. When Apps Manager makes a call to the Metric store, it requests all source IDs from the Cloud Controller API (CAPI).   If the doppler.firehose scope is not granted to the user, the request could take a long time for the Metric Store to wait for the CAPI resulted in slowness.

 

Environment

Product Version: 2.9+

Resolution

Workaround

To workaround this issue, you need to disable the Key Metrics feature by removing the Metric Store URL from the Apps Manager application Environment Variables following these steps:

1. Run 'cf login' and  'cf target' the system org and system space.

2. Run 'cf apps' and look for the app with the route for 'apps manager'.

3. Get the Apps Manager env variables by running 'cf env apps-manager-js-green/blue'.

4. Identify the variable name 'FOUNDATIONS'and copy the JSON string.

5. Paste the JSON string into an editor, and only remove the value from the 'metricStoreUrl' key.  For example, it should look like this after removing the value: metricStoreUrl":" ".

6. Copy the newly edited JSON string set 

7. Set the new variable for both apps-manager-js-green and apps-manager-js-blue instances.  For example, run cf set-env apps-manager-js-green/blue FOUNDATIONS '{"home":.."metricStoreUrl":" ",...}}'

8. Run 'cf restart apps-manager-js-green/blue'.  Note that you only need to restart the current running application, not both.

9. Reload the Apps Manager UI and confirm that you can no longer see Key Metrics. 

 
If Apps Manager loads considerably faster after making this change, then the issue is related to the calls to Metric Store. Otherwise, there could be other factors affecting Apps Manager loading times.
 

Additional Information

Note that you can make permanent change to the above workaround option by pasting the same new JSON string into 'Ops Manager > Apps Manager tab > Multi-foundation configuration' field, and run the 'Apply Changes'.
 

Based on the size of your foundation it might be worth considering several factors:
  • Are your Metric Store VMs running with high CPU/Memory? Please scale CPU and Memory.
  • Increase Max Concurrent Queries in the Metric Store tile through the Metric Store configuration.
  • Increase replication factor. This will help to reduce the amount of network traffic between nodes and provide increased high availability for your metrics should a node or 2 go down. 
IMPORTANT: Increasing the replication factor could lead to historical metrics being lost. 

Overall, you can grant the user doppler.firehose scope or use the workaround above to disable Key Metrics altogether.