"Data Retention (Metric Store Maintenance) service is unhealthy error" is seen on Platform Node1
search cancel

"Data Retention (Metric Store Maintenance) service is unhealthy error" is seen on Platform Node1

book

Article ID: 413829

calendar_today

Updated On:

Products

VCF Operations for Networks

Issue/Introduction

  1. "Data Retention (Metric Store Maintenance) service is unhealthy" error is seen on Aria Operations for Networks GUI

    Refer to below error screenshot:




  2. Above error can also be seen on multiple platform nodes in a Aria Operations for Networks Cluster setup and can be seen on Non Clustered platform node as well.

Environment

Aria Operations for Networks 6.13.0
Aria Operations for Networks 6.14.0
Aria Operations for Networks 6.14.1

Cause

If the HRegionServer is running and healthy and showing less uptime as compared to the Appliance node this means that service had some issues and this causes Metric Store updater cron job to stop.

This Cron Job is required to be run once every 24 hours. If cron job is stopped for some reason this alert is seen on platform node(s).

Resolution

If you have encountered this issue, ensure you DO NOT perform any manual shutdown or reboot procedure of platform node(s).

  1. Open a support case with Broadcom Support to review your Aria Operations for Networks deployment. For more information, see Creating and managing Broadcom support cases. 
     
  2. Capture below details:
  1.  On Aria Operations for Networks GUI, navigate to Settings>Infrastructure and support>Infrastructure and Updates page, from there take 1-2 screenshots covering the entire page, additionally if you see any problems, click on it and capture another screenshots showing all the problems.

  2. If the platform nodes are in clustered deployment then take a SSH/Putty session on VMware Aria Operations for Networks Platform Node1, login with username support

    Execute below commands:

    ub
    ./run_all.sh uptime
    ./run_all.sh df -h
    ./run_all.sh sudo /home/ubuntu/check-service-health.sh -p -d
    sudo -u hbase hbase hbck
    sudo cat /home/ubuntu/build-target/deployment/patch.txt
    sudo cat /home/ubuntu/build-target/deployment/appliance.status
    sudo grep id: /etc/vnera/deployment/deployment.def
    sudo -u hdfs hdfs dfsadmin -safemode get
    sudo -u hdfs hdfs dfsadmin -report
    sudo -u hdfs hdfs fsck /
    sudo -u hbase hbase hbck 


    Note: The outputs of above commands are expected to be longer hence copy/paste the outputs to a notepad file, save it and upload or sent as email attachment to the opened Case. 

  3. If there is only 1 Platform node then take a SSH/Putty session on VMware Aria Operations for Networks Platform Node, login with username support

    Execute below commands:
    ub
    .uptime
    df -h
    ./check-service-health.sh -p -d
    sudo -u hbase hbase hbck
    sudo cat /home/ubuntu/build-target/deployment/patch.txt
    sudo cat /home/ubuntu/build-target/deployment/appliance.status
    sudo grep id: /etc/vnera/deployment/deployment.def
    sudo -u hbase hbase hbck

Note: The outputs of above commands are expected to be longer hence copy/paste the outputs to a notepad file, save it and upload or sent as email attachment to the opened Case.

     3. Generate support bundle for Aria Operations for Networks platform Appliances on which you see the error message from GUI, see steps mentioned in  Broadcom Knowledge Base Article 343485