UIM Hub frequently restarts (every 20-30 minutes)
search cancel

UIM Hub frequently restarts (every 20-30 minutes)

book

Article ID: 439882

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

You observe that a Regional Hub is unstable and continually restarts or flaps approximately every 20-30 minutes. During these episodes, the Hub probe lags in processing the queued data from associated robots.  Eventually, the hub status alternates between green and red in Infrastructure Manager.  The logs show that the controller probe is restarting the hub probe due to non-response to the controller requests.

 

Environment

  • DX Unified Infrastructure Management (UIM) 23.4.3
  • Hub 23.4.3
  • Robot 23.4.3

Cause

While a definitive root cause is not fully determined, the instability is linked to a high volume of persistent connections on Hub port 48001.  A specific robot machine is creating dozens of established connections that do not close immediately, eventually overwhelming the Hub's ability to process QoS data and causing the probe to stop responding the the controller probe.  When the controller doesn't receive a response, it restarts the hub probe..

Resolution

You can stabilize the Hub by identifying the problematic robot and tuning the Hub's session handling parameters.

  1. **Identify the Source:**
    Run `netstat -an` on the Hub server to identify if a specific robot IP has a high count of established connections to port 48001.

  2. **Isolate the Robot:**
    Stop the Nimsoft Robot Watcher service on the suspect robot machine (`####`) to clear the connections and observe if Hub stability returns.

  3. **Adjust Hub Configuration:**
    Open the Hub probe configuration (Raw Configure) and navigate to the `<hub>` section. Add or modify the following parameters to better manage high traffic loads:
    • passive_session_timeout = 150
    • spooler_inbound_threads = 50

  4. **Restart Services:**
    Apply the changes and restart the Nimsoft Robot Watcher service on the Hub server.

  5. **Verify Monitoring:**
    Monitor port 48001 and 48002 to ensure connections are closing promptly and the Hub remains in a steady green state.