Unstable Hub losing connectivity intermittently

book

Article ID: 241317

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

UIM Hubs keeps going red and losing connectivity to  robots, cpu and memory above normal operation. Error in the controller log. See below:

Apr 6 14:09:56:616 [2800] 0 Controller: failed to send alive (async) to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:09:58:617 [2800] 0 Controller: failed to send alive to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:09:58:617 [2800] 0 Controller: failed to send alive (async) to hub naXXXX-sal(X.X.X.X) - communication error
Apr 6 14:10:00:618 [2800] 0 Controller: failed to send alive to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:10:00:618 [2800] 0 Controller: failed to send alive (async) to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:10:02:620 [2800] 0 Controller: failed to send alive to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:10:02:620 [2800] 0 Controller: failed to send alive (async) to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:10:04:616 [2800] 0 Controller: failed to send alive to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:10:04:616 [2800] 0 Controller: failed to send alive (async) to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:10:04:617 [2800] 0 Controller: Port dropped: hub 48002
Apr 6 14:10:06:651 [2800] 0 Controller: hub localhost(10.XX.XX.XX) NO CONTACT (communication error)
Apr 6 14:10:08:651 [2800] 0 Controller: hub localhost(10.XX.XX.XX) NO CONTACT (communication error)
Apr 6 14:10:10:652 [2800] 0 Controller: hub localhost(10.XX.XX.XX) NO CONTACT (communication error)
Apr 6 14:10:12:653 [2800] 0 Controller: hub localhost(10.XX.XX.XX) NO CONTACT (communication error)
Apr 6 14:10:12:710 [2800] 0 Controller: _ProcStart - Probe 'hub' - starting
Apr 6 14:10:13:748 [2800] 0 Controller: send_internal_alarm: sockConnect failed
Apr 6 14:10:14:859 [2800] 0 Controller: Hub localhost(10.XX.XX.XX) contact established
Apr 6 14:10:26:863 [2800] 0 Controller: Failed to send set_hub to spooler (communication error)
Apr 6 14:19:11:436 [2800] 0 Controller: Port dropped: hub 48002
Apr 6 14:19:14:437 [2800] 0 Controller: hub localhost(10.XX.XX.XX) NO CONTACT (communication error)
Apr 6 14:19:18:510 [2800] 0 Controller: _ProcStart - Probe 'hub' - starting
Apr 6 14:19:19:553 [2800] 0 Controller: send_internal_alarm: sockConnect failed
Apr 6 14:19:24:031 [2800] 0 Controller: Hub localhost(10.XX.XX.XX) contact established

 

 

Cause

Extra demand on resources stretching capacity beyond limit.

Environment

Release : 20.4

Component : UIM - HUB

Resolution

Distributed robots off that hub to other hubs to reduce the load to below 1000 robots from the initial 1224 robots on this particular hub.

Attachments