error upon trying to open the controller Unable to reach controller node communication error
search cancel

error upon trying to open the controller Unable to reach controller node communication error

book

Article ID: 34328

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM) Unified Infrastructure Management for Mainframe

Issue/Introduction

When trying to open the robot controller GUI an error displays:

error upon trying to open the controller:

Unable to reach controller,
node:
/NimsoftDevDom/NimsoftDevHub/xxxplknxnimump01/controller
error message: communication error


- local probes stopped working post-upgrade.
- All other probes on the machine displaying as red with lock icon
- ping hub from robot is fine
- telnet from Robot TO HUB on port 48002 is successful
- telnet from HUB TO Robot on port 48000 is successful
- All Infrastructure probes such as controller, hdb and spooler display port and PID and are green but we get a communication error when trying to open ANY probe.
- Tried manual reset of security (magic keys) but to no avail.
- All probes were working fine prior to upgrade
- netstat -an shows robot is listening
- Tools->Connect from IM hub connects to robot and gets INFO no problem
- no local or remote firewall
- All other robots working fine

dashboard_engine.log shows:

Mar 23 11:05:56:254 ERROR [OnValidateSubscription-229, Utilities:135] Error occurred during probe status verification : Probe Name =/NimsoftDevDom/NimsoftDevHub/xxxlknxnimnms01/nas
Mar 23 11:05:58:892 INFO [OnValidateSubscription-229, ThreadMethods:366] ************ Hub subscription failure detected, need restarting dashboard engine ****


controller.log shows:

Mar 23 10:58:12:407 [140079136823040] Controller: nimSession - failed to connect session to 10.164.x.xxx:48001, error code 111
Mar 23 10:58:12:407 [140079136823040] Controller: nimPostMessage: sockConnect failed
Mar 23 10:58:12:408 [140079136823040] Controller: send_internal_message - failed to flush message (permission denied)
Mar 23 10:58:15:668 [140079136823040] Controller: send_internal_alarm: sockConnect failed
Mar 23 10:58:15:678 [140079136823040] Controller: verify login - cmd=probe_list frm=10.164.x.xxx/47141 failed
Mar 23 10:58:17:314 [140079136823040] Controller: send_internal_alarm: sockConnect failed


hub shows this for the ump machine 10.164.x.xxx in the hub.log

Mar 23 12:28:45:024 [140240368928512] hub: SSL - SSL_accept error (1) on new SSL connection
Mar 23 12:28:45:040 [140240368928512] hub: SSL - accept failed for 10.164.x.xxx/44894

Environment

- Robot 7.62

Resolution

  1. Stop the robot on the Primary hub
  2. cd /opt/nimsoft/bin
  3. ./niminit stop
  4. cd to ...\Program Files (x86)\Nimsoft\hub
  5. Rename or delete the robot.sds file - this allows the robot to re-register to the hub as if it were new.
  6. Start the hub
  7. ./niminit start
  8. Recycle the UMP/OC machine (robot)
  9. ./niminit stop
  10. ./niminit start

Try to open the controller GUI again. It should work and then you can validate any other probes still displaying as red with a lock icon by selecting the probe and then rt-click to choose Security->Validate.

Potential cause: "something" changed about the robot - whether that was something like the IP address, OS Description, etc. But this seemed to, for whatever reason, cause the robot to fail to register with the hub, probably because the hub thought it was a duplicate/rogue robot.