search cancel

Unable to reach controller on linux robots but QOS works

book

Article ID: 258379

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

We have issues on 2 Linux servers, where all installed probes are giving the same error and error is "Unable to reach controller"

Full Error: 
"Unable to reach controller, node: /uim/primary_hub/hub/cdm, error message: communication error."

 

However, QOS is flowing correctly from the robots to the hub. 

 

•  There is no firewall between UIM and linux servers, local firewall is disabled on both sides.

•  I can telnet port 48000 on both sides.    

•  I have tried to reinstall robots, then delete and fresh install.

•  I have deleted cache. 

•  Robot version is 9.35, all probes are green in Infrastructure Manager. 

 

We have already tried the resolution on the following Articles:

error upon trying to open the controller Unable to reach controller node communication error (broadcom.com)

Linux robot communication error - Unable to reach controller, error message communication error (broadcom.com)

controller communication error - Unable to reach controller in UIM 20.3 with hub & robot 9.33 (broadcom.com)

Repeated "unable to reach controller" errors (broadcom.com)

Communication Error When Double-Clicking a Probe (broadcom.com)

Environment

Release : UIM 20.x - Robot 9.x

Cause

The problematic machines might be taking more time for controller "get_info" callback to execute, so the reason for the unreachable controller boils down to the slow networking stack. As it is taking more time to resolve the hostname, we see this issue. The get_info command that is taking 40 seconds to respond, has the functionality which interacts with the underlying OS to get network-related information.

From the logs, it is visible that the area where the system calls "gethostname" and "gethostbyname" is used, is taking 40 seconds of time.

These are the OS system calls which interact with the underlying networking stack to resolve the local hostname, if the underlying networking stack is slow then these callbacks will take more time.

This explains the controller GUI not be reachable while the robot actually works (QOS Flow is correct) 

 

 

 

Resolution

It is suggested to check how hostnames are resolved, if DNS is used for hostname resolutions then check the responsive speed of the DNS server.

Attachments