Controller on HA Hub keeps restarting

book

Article ID: 195585

calendar_today

Updated On:

Products

NIMSOFT PROBES DX Infrastructure Management

Issue/Introduction

Controller on HA Hub keeps restarting.
Issue started yesterday. We are still unable to determine the cause
Robot has been stopped and restarted.
VM Hosting HA Hub was also restarted as part of initial troubleshooting
Uninstalled HA probe and deleted folder. Deployed HA probe and controller goes into restart loop.

Sample controller log extract:

Jul 16 15:11:42:575 [5540] 0 Controller:  Running as user SYSTEM
Jul 16 15:11:42:575 [5540] 0 Controller: ----- 
Jul 16 15:11:42:575 [5540] 3 Controller: Controller - fetch expire information
Jul 16 15:11:42:575 [5540] 2 Controller: expire/fetch_expire - find full path to expire.cfg
Jul 16 15:11:42:575 [5540] 2 Controller: expire/fetch_expire - allocate space for D:\CA\UIM\robot/expire.cfg
Jul 16 15:11:42:575 [5540] 2 Controller: expire/fetch_expire - check access to D:\CA\UIM\robot/expire.cfg
Jul 16 15:11:42:575 [5540] 2 Controller: expire/fetch_expire - CRC check
Jul 16 15:11:42:575 [5540] 2 Controller: expire/VerifyCrc - D:\CA\UIM\robot/expire.cfg, D:\CA\UIM\robot/expire.crc
Jul 16 15:11:42:575 [5540] 2 Controller: expire/VerifyCrc - generate checksum based on robot name
Jul 16 15:11:42:575 [5540] 2 Controller: expire/gen_key D:\CA\UIM\robot/expire.cfg
Jul 16 15:11:42:575 [5540] 2 Controller: expire/gen_key 14Jyd021JOOT/U1CzNPWDA==
Jul 16 15:11:42:575 [5540] 2 Controller: expire/gen_key MKHP-UIM-HUB01
Jul 16 15:11:42:575 [5540] 2 Controller: expire/gen_key GKSvp7w56RMTjGHAk7GEdqhkAStgSS4Wfliskl+jCvW5vToWhFrlzGLQcIOX1BDS
Jul 16 15:11:42:575 [5540] 2 Controller: expire/VerifyCrc - open crc file and compare
Jul 16 15:11:42:575 [5540] 3 Controller: expire/VerifyCrc - found: 1
Jul 16 15:11:42:575 [5540] 2 Controller: expire/fetch_expire - ok; read
Jul 16 15:11:42:575 [5540] 4 Controller: cfgReader file open: D:\CA\UIM\robot/expire.cfg
Jul 16 15:11:42:575 [5540] 4 Controller: cfgReader file close: D:\CA\UIM\robot/expire.cfg
Jul 16 15:11:42:575 [5540] 2 Controller: expire/fetch_expire - done
Jul 16 15:11:42:575 [5540] 2 Controller: Change directory to D:\CA\UIM
Jul 16 15:11:42:575 [5540] 3 Controller: validating character encoding of config file: D:\CA\UIM\robot\controller.cfg
Jul 16 15:11:42:575 [5540] 4 Controller: nimCharsetValidateFile: D:\CA\UIM\robot\controller.cfg: no target charset
Jul 16 15:11:42:575 [5540] 4 Controller: cfgReader file open: D:\CA\UIM/pids/nimbus-0.pids
Jul 16 15:11:42:575 [5540] 4 Controller: cfgReader file close: D:\CA\UIM/pids/nimbus-0.pids
Jul 16 15:11:42:575 [5540] 0 Controller: Stopping processes from previous run
Jul 16 15:11:42:575 [5540] 0 Controller: ProcessControl: Sending ^C signal to hub (3704)...
Jul 16 15:11:49:542 [5540] 1 Controller: ProcessControl: Child exit code: 0x0
Jul 16 15:11:49:542 [5540] 0 Controller: ProcessControl: Sending ^C signal to distsrv (2708)...
Jul 16 15:11:51:919 [5540] 1 Controller: ProcessControl: Child exit code: 0x0
Jul 16 15:11:51:919 [5540] 0 Controller: ProcessControl: Sending ^C signal to hdb (5452)...
Jul 16 15:11:54:434 [5540] 1 Controller: ProcessControl: Child exit code: 0x0
Jul 16 15:11:54:434 [5540] 0 Controller: ProcessControl: Sending ^C signal to cdm (6028)...
Jul 16 15:12:00:979 [5540] 1 Controller: ProcessControl: Child exit code: 0x0
Jul 16 15:12:00:979 [5540] 0 Controller: ProcessControl: Sending ^C signal to HA (4212)...
Jul 16 15:12:06:155 [5540] 1 Controller: ProcessControl: Child exit code: 0x0
Jul 16 15:12:06:155 [5540] 0 Controller: ProcessControl: Sending ^C signal to emailgtw (7080)...
Jul 16 15:12:16:156 [5540] 0 Controller: ProcessControl: Process emailgtw (7080) still running - terminating
Jul 16 15:12:16:156 [5540] 0 Controller: ProcessControl: Sending ^C signal to automated_deployment_engine (6080)...
Jul 16 15:12:16:156 [5540] 0 Controller: ProcessControl: Unable to send stop signal to process automated_deployment_engine (6080)
Jul 16 15:12:16:156 [5540] 0 Controller: ProcessControl: Process 6080 not found
Jul 16 15:12:16:156 [5540] 0 Controller: ProcessControl: Sending ^C signal to baseline_engine (6888)...
Jul 16 15:12:16:156 [5540] 0 Controller: ProcessControl: Unable to send stop signal to process baseline_engine (6888)
Jul 16 15:12:16:156 [5540] 0 Controller: ProcessControl: Process 6888 not found
Jul 16 15:12:16:156 [5540] 0 Controller: ProcessControl: Sending ^C signal to discovery_agent (6968)...
Jul 16 15:12:16:156 [5540] 0 Controller: ProcessControl: Unable to send stop signal to process discovery_agent (6968)
Jul 16 15:12:16:156 [5540] 0 Controller: ProcessControl: Process 6968 not found
Jul 16 15:12:16:156 [5540] 0 Controller: ProcessControl: Sending ^C signal to alarm_enrichment (3732)...
Jul 16 15:12:16:156 [5540] 0 Controller: ProcessControl: Unable to send stop signal to process alarm_enrichment (3732)
Jul 16 15:12:16:156 [5540] 0 Controller: ProcessControl: Process 3732 not found

Cause

Suspect Windows patch either corrupt or did something to lock down communication.

Environment

Release : 20.1

Component : UIM - HA

Resolution

Restored server prior to patches.