The services aren't starting in the Primary hub after VMware snapshot of Primary hub is restored

book

Article ID: 221629

calendar_today

Updated On:

Products

DX Infrastructure Management NIMSOFT PROBES CA Unified Infrastructure Management for z Systems CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

No errors displayed.

Processes start but unable to login to IM with the administrator account. Domain and Hub do not show in the IM login window.

controller.log shows many entries indicating that processes cannot be stopped:

Aug  7 23:45:30:791 [15204] 0 Controller: ProcessControl: Sending ^C signal to distsrv (5220)...
Aug  7 23:45:30:791 [15204] 0 Controller: ProcessControl: Unable to send stop signal to process distsrv (5220)
Aug  7 23:45:31:792 [15204] 0 Controller: ProcessControl: Process distsrv (5220) still running - terminating
Aug  7 23:45:31:798 [15204] 0 Controller: ProcessControl: Sending ^C signal to hdb (1648)...
Aug  7 23:45:31:798 [15204] 0 Controller: ProcessControl: Unable to send stop signal to process hdb (1648)
Aug  7 23:45:32:798 [15204] 0 Controller: ProcessControl: Process hdb (1648) still running - terminating
Aug  7 23:45:32:898 [15204] 0 Controller: ProcessControl: Sending ^C signal to emailgtw (2512)...
Aug  7 23:45:32:898 [15204] 0 Controller: ProcessControl: Unable to send stop signal to process emailgtw (2512)
Aug  7 23:45:33:898 [15204] 0 Controller: ProcessControl: Process emailgtw (2512) still running - terminating
Aug  7 23:45:33:998 [15204] 0 Controller: ProcessControl: Sending ^C signal to configdeployer (4440)...
Aug  7 23:45:33:998 [15204] 0 Controller: ProcessControl: Unable to send stop signal to process configdeployer (4440)
Aug  7 23:45:34:999 [15204] 0 Controller: ProcessControl: Process configdeployer (4440) still running - terminating
Aug  7 23:45:35:099 [15204] 0 Controller: ProcessControl: Sending ^C signal to adogtw (3628)...

Nimsoft Robot Watcher service appears to start but the controller.log shows the following entries repating over and over in the log:

Aug  8 01:07:14:503 [1564] 0 Controller: Selecting robotip from configuration. config_robotip = , cglob robotip = 10.xxx.xx.10, local_ip_validation = 1, validate_ip_suggestion = 0, strict_ip_binding = 0
Aug  8 01:07:21:531 [2024] 0 Controller: Selecting robotip from configuration. config_robotip = , cglob robotip = 10.xxx.xx.10, local_ip_validation = 1, validate_ip_suggestion = 0, strict_ip_binding = 0
Aug  8 01:07:29:582 [3256] 0 Controller: Selecting robotip from configuration. config_robotip = , cglob robotip = 10.xxx.xx.10, local_ip_validation = 1, validate_ip_suggestion = 0, strict_ip_binding = 0
etc...
etc...

Cause

- Restoration of corrupt vmware snapshot

Environment

Release : 20.3

Component : UIM - HUB

Resolution

- Restoration of last good VMWARE snapshot. A different 'good' VMware snapshot from earlier the past week was restored and then ONLY the robot folder was restored from the backup and the Primary hub-robot started up just fine. mpse had to be redeployed.

After examining the restored robot folder which allowed the hub and services to startup without issue, here are the findings:

Bad versus good robot - summary of main differences

1. cfgs folder was completely missing from the 'bad' robot.

2. robot.cfg in the 'bad' robot was severely truncated and contained 23 lines versus 61 lines for the good robot.cfg.

3. robot.cfg contained a different location for the cryptkey/certificate.pem:

   cryptkey = U:\Nimsoft\robot\certs\certificate.pem

     versus

   cryptkey = U:\Nimsoft\security\certificate.pem

Additional Information

  • Customers may want to ask their VMware admin to check the vmware.log and/or hostd.log for errors during the time period in which the SNAPSHOT was being taken.
  • Make sure no AV/security scans are are occurring at the same time when the SNAPSHOT is being taken.