Issues with HRegion Service running but not healthy and Data Retention Metric Store Maintenance Service is unhealthy seen in Aria Operations for Network GUI
search cancel

Issues with HRegion Service running but not healthy and Data Retention Metric Store Maintenance Service is unhealthy seen in Aria Operations for Network GUI

book

Article ID: 324414

calendar_today

Updated On:

Products

VCF Operations for Networks

Issue/Introduction


Aria Operations for Network platform nodes may show any of the below errors for services on GUI 

HRegionServer is running but not healthy. 
Data Retention (Metric Store Maintenance) service is unhealthy.
One or more essential services are not healthy.
TSDB Server failed to flush data to HBase.

Environment

Aria Operations for Networks 6.12.0
Aria Operations for Networks 6.12.1
Aria Operations for Networks 6.13.0
Aria Operations for Networks 6.14.0

Cause

One of below or combination has been identified as cause of this issue:

  1. Unexpected/unwanted shutdown or reboot of the Platform Node causing HBASE/HDFS Database Inconsistencies ,which results in services running unhealthy or not running.

  2. Manual shutdown  of Platform Node(s) in a clustered deployment.

  3. Shutdown of Platform Node(s) using Aria Suite Lifecycle manager  in a clustered deployment to take snapshots using Aria Operations for Network

Resolution

If you have encountered this issue, ensure you DO NOT perform any manual shutdown or reboot procedure of Platform Node(s).

  1. Open a support case with Broadcom Support to review your Aria Operations for Networks deployment. For more information, see Creating and managing Broadcom support cases. 
     
  2. Capture below details:
  1.  On Aria Operations for Networks GUI , Navigate to Settings>Infrastructure and support>Infrastructure and Updates pages, from there take 1-2 screenshots covering the entire page, additionally if you see any Problems Click on it and capture another screenshots showing all the problems.
  2. If Platform nodes are in Clustered deployment then take a SSH/Putty session on VMware Aria Operations for Networks Platform Node1, login with username support

    Execute below commands:

    ub
    ./run_all.sh uptime
    ./run_all.sh df -h
    ./run_all.sh sudo /home/ubuntu/check-service-health.sh -p -d
    sudo -u hbase hbase hbck
    sudo cat /home/ubuntu/build-target/deployment/patch.txt
    sudo cat /home/ubuntu/build-target/deployment/appliance.status
    sudo grep id: /etc/vnera/deployment/deployment.def

    Note: Outputs of above commands are expected to be longer hence copy/paste the outputs to a Notepad file, save it and upload or sent as email attachment to to this Case. 

  3. If there is only 1 Platform node then take a SSH/Putty session on VMware Aria Operations for Networks Platform Node1, login with username support

    Execute below commands:
    ub
    .uptime
    df -h
    ./check-service-health.sh -p -d
    sudo -u hbase hbase hbck
    sudo cat /home/ubuntu/build-target/deployment/patch.txt
    sudo cat /home/ubuntu/build-target/deployment/appliance.status
    sudo grep id: /etc/vnera/deployment/deployment.def

Note: Outputs of above commands are expected to be longer hence copy/paste the outputs to a Notepad file, save it and upload or sent as email attachment to to this Case. 

Additional Information

Any time a shutdown of a platform node cluster is needed, for example to take cold snapshots, it is recommended to follow follow Best practices to shutdown Aria Operations for Networks Clustered deployments to avoid this issue this issue.

Attachments

hbase_repair_script.sh.txt get_app