"Metric Store query failed, metrics may not display for one or more charts" error in Apps Metrics
search cancel

"Metric Store query failed, metrics may not display for one or more charts" error in Apps Metrics

book

Article ID: 298100

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

Before proceeding with this article, confirm that the following is true:

1. You recently upgraded to Tanzu Application Service (TAS) for VMs 2.9 or higher, or you are already running that version of TAS for VMs but recently installed App Metrics v2 and Metric Store.

2. When trying to view metrics in the App Metrics UI, it loads for about a minute and eventually you get the following error: 
"Metric Store query failed, metrics may not display for one or more charts"

3. Another symptom you might notice is Apps Manager also not loading container metrics.

While there are many reasons why you could get that error on the UI. If all of the above is true, the most likely reason is due to a change with Apps Manager in TAS for VMs 2.9. Starting with this release, it went from using CAPI endpoints to retrieve metrics to using the Metric Store if it's installed on the foundation. See the following release notes for more information: https://docs.pivotal.io/application-service/2-9/overview/release-notes/runtime-rn.html#view-key-metrics.

Apps Manager makes several PromQL queries to the Metric Store every 10 seconds by default to update the metrics on the dashboard. Depending on how many developers are using Apps Manager, this causes a considerable increase in traffic to the Metric Store API.

Another factor that contributes to the issue is that Metric Store does not have replication enabled as of version 1.4.4. This means that every query ends up going to the same node where metrics are stored for that application. Replication will be enabled in a future release, which not only will bring more resiliency to the product but will also help not overload one single node.

The result of increased traffic and having no replication is high wait times in the query queue within the Metric Store which will timeout after 60 seconds and return an error to the user.

Environment

Product Version: 2.9

Resolution

Troubleshooting

1. Ensure that the issue is not specific to App Metrics. If Apps Manager is also not loading metrics then that's an indicator of Metric Store having issues with queries.

You can also curl the metric store API directly and check for a timeout error, for example:
curl -vk -H "Authorization: $(cf oauth-token)" -G "https://metric-store.SYS-DOMAIN/api/v1/query" --data-urlencode "query=avg(avg_over_time(cpu{source_id='APP-GUID'}[60s])) by (process_type)" | jq

2. You can try restarting the metric-store job across all the instances, for example:
bosh -d metric-store-GUID ssh metric-store -c "sudo /var/vcap/bosh/bin/monit restart metric-store"

3. Lastly, if restarting the job doesn't help, you can switch Apps Manager to using CAPI endpoints as a workaround.

Do the following:
 
a. cf login and cf target the system org and system space.
b. Run cf apps and look for the app that currently has the route for apps manager.
c. Look at the variables by running cf env
d. Identify the variable FOUNDATIONS and copy the JSON string
e. Go to Ops Manager > Apps Manager tab.
f. Paste the JSON string into the Multi-foundation configuration field.
g. Look for metricStoreUrl and delete ONLY the URL. For example: "metricStoreUrl": ""
h. Save and run Apply Changes. Ensure the TAS tile is selected, the other tiles can be unselected.


Note: If you are unable to run an Apply Changes at this time and need the issue resolved with urgency, you can also use the cf set-env command to remove the metricStoreUrl from the JSON string in Apps Manager env variables. Then simply cf restart Apps Manager. Keep in mind that a future Apply Changes could undo this change.

After the Apply Changes completes, the container metrics dashboard in Apps Manager should go back to the way it looked in previous TAS versions. This will allow developers to use Apps Manager right away without timeouts.

Please note that it could take a couple of hours for the errors in App Metrics to go away.


Important Update

Metric Store 1.5.1 is now available and enables replication x2 by default but it can also be configured in the tile. This upgrade should help resolve this issue.

If you are already running Metric Store 1.5.1 and continue to experience slowness with Apps Manager, please open a case with VMware Tanzu support.