"HRegion Service running but not healthy" in a Simple Deployment (one Platform node) of VCF Operations for Networks
search cancel

"HRegion Service running but not healthy" in a Simple Deployment (one Platform node) of VCF Operations for Networks

book

Article ID: 433022

calendar_today

Updated On:

Products

VCF Operations for Networks

Issue/Introduction

NOTE:  This KB article is appropriate only for simple deployments of VCF Operations for Networks.  

  • Simple deployments of VCF Operations for Networks are deployments where there is only a single Platform node, regardless of how many Collector node(s) have been deployed.

OBSERVED SYMPTOMS:

While logged into the VCF Operations for Networks GUI, and selecting Settings --> Infrastructure and Support --> Infrastructure and Updates, you observe one of more "Problem(s)" alerts.

The principal alert of concern regarding this KB is "HRegionServer is running but not healthy."

There may be other alerts that appear as well, including examples like:

  • Data Retention (Metric Store Maintenance) service is unhealthy.

  • TSDB Server failed to flush data to HBase.

 

NOTE:  VCF Operations for Networks was formerly named Aria Operations for Networks (AON), and prior to that was named vRealize Network Insight (vRNI).

 

 

Environment

VCF Operations for Networks

Cause

The precise root cause is indeterminate; however, the symptoms indicate an HBase/HDFS database inconsistency.

This state is typically triggered by either an unexpected Platform Node power event or abrupt shutdown in a non-graceful manner (instead of Power --> Shut Down Guest O/S in vCenter, for example). 

It can also be caused by an improper scale-up operation to brick sizes that are larger than originally deployed.  

 

Resolution

If you have encountered this issue, ensure you DO NOT perform any manual shutdown or reboot procedure of Platform Node(s).

  1. Open a support case with Broadcom Support using the directions at KB 142884 - Creating and managing Broadcom cases to review your VCF Operations for Networks deployment. 

  2. On the VCF Operations for Networks GUI , Navigate to Settings --> Infrastructure and support --> Infrastructure and Updates pages, from there take a sufficient number of screenshots to capture the entire page.

    • Additionally, if there are any Problem(s) displayed, click on each problem and for each problem, capture sufficient screenshots to illustrate the detail of the alert.

  3. Open a SSH/Putty session to the VCF Operations for Networks Platform Node using the support user.

    • Start logging for the SSH session selecting "Printable Output" and directing the logging to a file with a name like "<Date>Case_#######_Putty_Log_Platform.log" (where ####### is the Broadcom Support Case number)

    • Execute the following commands:

      • ub
      • cd /home/ubuntu/
      • uptime
      • df -h
      • ./check-service-health.sh -p -d
      • sudo -u hbase hbase hbck
      • sudo cat /home/ubuntu/build-target/deployment/patch.txt
      • sudo cat /home/ubuntu/build-target/deployment/appliance.status
      • sudo grep id: /etc/vnera/deployment/deployment.def

  4. On the VCF Operations for Networks GUI , Navigate to Settings --> Infrastructure and support --> Support, and select the Platform nodes and any Collector node(s) and click "Create Support Bundle"

  5. Attach the following materials to the Support Case using the Instructions at KB 140731 - Uploading files to cases on the Broadcom Support Portal