search cancel

Unstable Hub losing connectivity intermittently

book

Article ID: 241317

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

UIM Hubs keeps going red and losing connectivity to  robots, cpu and memory above normal operation. Error in the controller log. See below:

Apr 6 14:09:56:616 [2800] 0 Controller: failed to send alive (async) to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:09:58:617 [2800] 0 Controller: failed to send alive to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:09:58:617 [2800] 0 Controller: failed to send alive (async) to hub naXXXX-sal(X.X.X.X) - communication error
Apr 6 14:10:00:618 [2800] 0 Controller: failed to send alive to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:10:00:618 [2800] 0 Controller: failed to send alive (async) to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:10:02:620 [2800] 0 Controller: failed to send alive to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:10:02:620 [2800] 0 Controller: failed to send alive (async) to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:10:04:616 [2800] 0 Controller: failed to send alive to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:10:04:616 [2800] 0 Controller: failed to send alive (async) to hub naXXXX-sal(10.XX.XX.XX) - communication error
Apr 6 14:10:04:617 [2800] 0 Controller: Port dropped: hub 48002
Apr 6 14:10:06:651 [2800] 0 Controller: hub localhost(10.XX.XX.XX) NO CONTACT (communication error)
Apr 6 14:10:08:651 [2800] 0 Controller: hub localhost(10.XX.XX.XX) NO CONTACT (communication error)
Apr 6 14:10:10:652 [2800] 0 Controller: hub localhost(10.XX.XX.XX) NO CONTACT (communication error)
Apr 6 14:10:12:653 [2800] 0 Controller: hub localhost(10.XX.XX.XX) NO CONTACT (communication error)
Apr 6 14:10:12:710 [2800] 0 Controller: _ProcStart - Probe 'hub' - starting
Apr 6 14:10:13:748 [2800] 0 Controller: send_internal_alarm: sockConnect failed
Apr 6 14:10:14:859 [2800] 0 Controller: Hub localhost(10.XX.XX.XX) contact established
Apr 6 14:10:26:863 [2800] 0 Controller: Failed to send set_hub to spooler (communication error)
Apr 6 14:19:11:436 [2800] 0 Controller: Port dropped: hub 48002
Apr 6 14:19:14:437 [2800] 0 Controller: hub localhost(10.XX.XX.XX) NO CONTACT (communication error)
Apr 6 14:19:18:510 [2800] 0 Controller: _ProcStart - Probe 'hub' - starting
Apr 6 14:19:19:553 [2800] 0 Controller: send_internal_alarm: sockConnect failed
Apr 6 14:19:24:031 [2800] 0 Controller: Hub localhost(10.XX.XX.XX) contact established

 

 

Environment

Release : 20.4

Component : UIM - HUB

Cause

Extra demand on resources stretching capacity beyond limit.

Resolution

Distributed robots off that hub to other hubs to reduce the load to below 1000 robots from the initial 1224 robots on this particular hub.

Attachments