Could not get connection string from the Data Engine
search cancel

Could not get connection string from the Data Engine

book

Article ID: 245765

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

My primary hub unexpectedly restarted.

There is a controller.exe crash in Windows Event Viewer.

After the restart a lot of probes are failing to start with a failure to get the connection string.

Example from audit probe - but all probes report the same sequence:

Jul  8 10:35:40:524 [6320] Controller: Max. restarts reached for probe 'audit' (command = audit.exe) 
Jul  8 10:39:37:510 [8836] audit: ##################### START ##################### 
Jul  8 10:39:37:510 [8836] audit: audit 9.04 [Dec  4 2018] 
Jul  8 10:39:37:510 [8836] audit: Copyright 2018, CA. All rights reserved. 
Jul  8 10:39:37:510 [8836] audit: Base_Database::init_lib - Platform: Windows 
Jul  8 10:39:37:510 [8836] audit: Base_Database::init_lib CoInitialize() OK 
Jul  8 10:39:37:542 [8836] audit: Connection Interface OK - ADO Version: 6.3 
Jul  8 10:39:37:542 [8836] audit: Recordset Interface OK 
Jul  8 10:39:37:542 [8836] audit: Command Interface OK 
Jul  8 10:39:37:542 [8836] audit: Base_Database::init_lib TestInterfaces << OK, version: 6.3 
Jul  8 10:39:37:542 [8836] audit: Base_Database::init_lib - Platform: MySQL 
Jul  8 10:39:37:542 [8836] audit: Database_global_lock LOCK 
Jul  8 10:39:37:557 [8836] audit: Database_global_lock UNLOCK 
Jul  8 10:39:37:557 [8836] audit: Database init library OK 
Jul  8 10:39:37:557 [8836] audit: Database_global_lock LOCK 
Jul  8 10:39:37:557 [8836] audit: Base_Database::init_lib - Result: true 
Jul  8 10:39:37:573 [8836] audit: SREQUEST: probe_checkin ->10.244.x.x/48000 
Jul  8 10:39:37:573 [8836] audit: RREPLY: status=OK(0) <-10.244.x.x/48000  h=37 d=449 
Jul  8 10:39:37:573 [8836] audit: SREQUEST: _close ->10.244.x.x/48000 
Jul  8 10:39:37:573 [8836] audit: nimSessionServer - port = 0 
Jul  8 10:39:38:010 [8836] audit: X509 NAME add entry stateOrProvinceName=/n/a/n/a/n/a 
Jul  8 10:39:38:010 [8836] audit: X509 NAME add entry organizationName=n/a 
Jul  8 10:39:38:010 [8836] audit: X509 NAME add entry organizationalUnitName=n/a 
Jul  8 10:39:38:010 [8836] audit: X509 NAME add entry commonName=10.244.x.x
Jul  8 10:39:38:026 [8836] audit: X509 NAME add entry stateOrProvinceName=/n/a/n/a/n/a 
Jul  8 10:39:38:026 [8836] audit: X509 NAME add entry organizationName=n/a 
Jul  8 10:39:38:026 [8836] audit: X509 NAME add entry organizationalUnitName=n/a 
Jul  8 10:39:38:026 [8836] audit: X509 NAME add entry commonName=10.244.x.x
Jul  8 10:39:38:026 [8836] audit: X509 EXTENSION add basicConstraints=critical,CA:FALSE 
Jul  8 10:39:38:026 [8836] audit: X509 EXTENSION add nsComment=NMS Robot Generated Certificate (Generated by: OpenSSL 1.0.2p  14 Aug 2018) 
Jul  8 10:39:38:026 [8836] audit: SSL - create intermediate certificate: OK (455 ms) 
Jul  8 10:39:38:026 [8836] audit: sockGetFreePortStartingFrom: 48021: using fam=2 strict=0 
Jul  8 10:39:38:026 [8836] audit: sockServer: next available port is 48021 
Jul  8 10:39:38:026 [8836] audit: port=48021 PID=6972 debug=3 
Jul  8 10:39:38:026 [8836] audit: DataEngineConnectionString called (timeout = 300) 
Jul  8 10:39:38:026 [8836] audit: Use configured data engine [/Domain/hub/robot/data_engine] 
Jul  8 10:39:38:026 [8836] audit: DataEngineConnectionString - data_engine=/Domain/hub/robot/data_engine 
Jul  8 10:39:41:046 [8836] audit: SREQUEST: nametoip ->10.244.x.x/48000 
Jul  8 10:39:41:046 [8836] audit: RREPLY: status=OK(0) <-10.244.x.x/48000  h=37 d=36 
Jul  8 10:39:41:046 [8836] audit: SREQUEST: _close ->10.244.x.x/48000 
Jul  8 10:39:41:046 [8836] audit: data_engine= 10.244.x.x/48010 
Jul  8 10:39:41:046 [8836] audit: SREQUEST: get_connection_string ->10.244.x.x/48010 
Jul  8 10:39:41:046 [8836] audit: RREPLY: status=command not found(11) <-10.244.x.x/48010  h=38 d=0 
Jul  8 10:39:41:046 [8836] audit: SREQUEST: _close ->10.244.x.x/48010 
Jul  8 10:39:41:046 [8836] audit: Could not get connection string from the Data Engine, check that it is running (command not found) 
Jul  8 10:39:41:046 [8836] audit: Database_global_lock LOCK 
Jul  8 10:39:41:046 [8836] audit: Database_global_lock UNLOCK 
Jul  8 10:39:41:046 [8836] audit: Database library - program is exiting through exit()... 
Jul  8 10:39:41:046 [8836] audit: Database library unloaded 
Jul  8 10:39:41:046 [8836] audit: Database_global_lock DESTOY LOCK 
Jul  8 10:39:42:171 [6320] Controller: Max. restarts reached for probe 'audit' (command = audit.exe) 

Environment

  • Release : 20.4
  • Component : UIM - ROBOT

Cause

Another probe could be assigned to the same port as data_engine after the crash:

Resolution

  1. Stop the Robot Watcher Service
  2. Wait 5 full minutes for all ports and pids to be released
  3. Start the service

Attachments