Cannot connect to hub and controller.cfg is missing probes/truncated
search cancel

Cannot connect to hub and controller.cfg is missing probes/truncated

book

Article ID: 35028

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

If the primary hub cannot start and you are unable to connect to it at all, but the Services appear to be up and running, this may be due to missing probe definitions in the controller.cfg in $NIMROOT/robot directory:

This issue can happen if for example, the server crashed while the controller.cfg was still locked by the Nimsoft Robot watcher process, and became corrupted or truncated. On reboot, the robot watcher service then recreated the file but it is then only containing the controller probe entry. This then causes none of the other core probes to start, and shows the hub as not started up.

Symptoms Include (but are not limited to):

  • unable to connect to hub with Infrastructure Manager - "no communication with hub"
  • hub.exe not running in Task Manager / nimbus(hub) not present in "ps" command output
  • cannot access any of the basic interfaces (OC/AC/IM)
  • telnet to port 48002 fails on localhost
  • error in controller.log: "sockConnect - connect to 127.0.0.1 48002 failed 10061"

 

Upon examination of the controller.cfg file it will be noted that one or more probes are not listed as expected - this file should contain one entry for each probe installed on the robot, but in this case it may contain only 1-3 probes and the entry for <hub> will be missing along with many/most other probes.

Environment

Release: DX UIM 20.4 or higher
Component: Robot (controller)

Cause

  • corruption/truncation of controller.cfg

Resolution

If there is a backup copy of controller.cfg available, replace the current controller.cfg with the backup and restart the robot service.

If there is no backup copy of controller.cfg available, then follow these steps:
  1. Replace the corrupted controller.cfg with a copy from any other hub which should be sufficient to get the hub started even if the probes are not 100% identical.

    (In the zip file attached to this document you will find a sample_controller.cfg which may be used as the bare minimum to get the hub started - you may use this if you do not have a backup copy. The file contains a sample for a Primary Hub and non-Primary hub for both Linux and Windows.)

  2. If you have used a backup copy you will need to go through it and remove all lines from the file which start with 'magic_key' (remove the entire line which contains the magic_key entry). Most probes will have a magic_key entry associated with them and you will need to remove all of these lines from the controller.cfg file.

    After you remove the magic_key entries, save the file.  Now Move (don't copy) controller.cfg to $NIMROOT\robot\changes folder.

    On Linux, an easy way to accomplish this entire step with a single command is as follows:

    grep -v magic_key /opt/nimsoft/robot/controller.cfg > /opt/nimsoft/robot/changes/controller.cfg

    After you execute this command, delete the controller.cfg from /opt/nimsoft/robot.

    On Windows, you could use a text editor like Notepad++ with advanced search-and-replace features to delete all the magic_key entries/lines.

  3. Restart the robot watcher service.

The file from the \robot\changes folder will be processed and now there should be a new controller.cfg in $NIMROOT\robot folder..

Now you should be able to log in, and observe that at least controller/hdb and hub probes should be active and green in Infrastructure Manager.


 Follow the next instructions to generate new magic keys.

If there is at least ONE directly connected hub on the same network as the primary hub:
  1. Login to the secondary hub using Infrastructure Manager

  2. On the primary hub, only the controller will be up and it will automatically attach to the nearest/secondary hub as it is running as a robot only. If it doesn't show up and attach to the secondary hub, use Connect Robot tool in Infrastructure Manager to attach the primary hub robot to the secondary hub.

  3. Once, you have the primary hub robot attached to secondary hub, validate the hub probe by right clicking on the hub probe and then select Security->Validate. After the hub probe is active it will detach from secondary hub and will take the hub role. validate the other probes in the same way as you just did with the hub probe.

If there is NO directly connected hub on the same network as the primary hub?

If the hub probe is not running: 
    1. Launch a command prompt and navigate to the $NIMROOT\hub folder and execute this command: hub.exe -d3 -lstdout

    2. Leave this running in the command window (this will launch the hub so you can log in), do not close the command prompt.

    3. Launch Infrastructure manager and login to the hub.

    4. Right click on the hub probe, choose "Security" then "Validate".

You will see a prompt asking you if you wish to activate the probe.
 
Click Yes.

Once the operation completes, you will see that the hub is up and running.
 
If the hub is already running, or after you have launched the hub as above, you will need to locate any additional "red" probes, right click each of them, and choose "Security" and then "Validate" as above.
 
Finalization:
 
This step is needed to make sure that all probes which are actually installed on the system are in sync with what you see in the Infrastructure Manager UI.
  1. Login to Infrastructure Manager

  2. Validate any/all red-lock icon probes, if any on the hub

  3. Go into the folder $NIMROOT\Nimsoft\probes and in here you will see several subfolders (e.g. application, system, slm, etc).

    Go into each subfolder and one-by-one, look at the folder names which represent the probes which are already installed on the system.

    For each probe which you find under each subfolder, make sure the corresponding probe is displayed in Infrastructure Manager.

    If a probe is missing, locate that probe in the Archive, and re-deploy it to the hub.  The existing configuration will be preserved.

    If there are any probes which show up in Infrastructure Manager but do NOT have a corresponding folder, you can right-click and delete them in IM.

 

Additional Information

The following Powershell Script can be used in Windows to identify all the probes which are already installed under the /probes/ folder.  You can run this script to get a list of the probes which need to be re-deployed.

# change the below line if your installation path is different
$baseDir = "C:\Program Files (x86)\Nimsoft\probes"

$subDirs = Get-ChildItem -Path $baseDir -Directory | ForEach-Object {
    Get-ChildItem -Path $_.FullName -Directory | ForEach-Object {
        $_.Name
    }
}

$subDirs | ForEach-Object { Write-Output $_ }

 

The following is a Linux/bash script which will accomplish the same thing for a Linux hub installation:

#!/bin/bash

# change the below line if your installation path is different
base_dir="/opt/nimsoft/probes/"

# Find directories one level below base_dir
find "$base_dir" -mindepth 2 -maxdepth 2 -type d | while read -r dir; do
  # Extract the last subdirectory name
  basename "$dir"
done

 

Attachments

sample_cfgs.zip get_app