What are the reasons and solution for the alarm, "Probe failed to start." (ex. Probe cdm failed to start, Probe ntevl failed to start, etc. We are getting top talker alerts for these types of issues.
Probe failed to start issues/errors
Controller: Probe '<probe_name>' FAILED to start (command = <probe_name>.exe) error = <number_or_error_message)>
The root cause of "Probe failed to start" errors/issues may be due to one or more of the following factors listed below:
- Anti-virus blocking/interference (Check Windows Events, Application and System logs)
- no valid probe license (Check the probe license) - in some cases its easy enough to resolve a probe license issue by taking the following steps:
1. Enable forwarding on the Primary Hub distsrv
2. Setup a forwarding record to forward/copy 'All' currently valid licenses from the Primary hub to the remote hub
3. Acknowledge the old license error
- probe software or hardware requirements not met (check probe Help docs)
- probe dependencies (see probe package Dependencies Tab in the Primary hub's local archive) - Note that for example, based on the ntevl and ntservices probe packages, package dependencies include vs2017_vcredist_x64 so make sure that package is installed on the local robot where they are deployed. And if it's there already, try a reinstall of the package.
- probe startup dependencies
- OS support (trying to run the probe on an untested/unsupported OS/OS version (Check the probe support matrix)
- probe dependency (one or more dependent probes has not been started)
- insufficient memory
- required dependency on Windows .NET version is not present on the robot
- java min/max settings increased and the probe wont start because there is not enough physical memory available on the system (lower the min/mx)
- Robot's JAVA environment is incorrectly configured, e.g., JAVA_HOME, classpath, etc.
- there are multiple versions of java on the robot and the probe is using the wrong version/instance
- Required libraries for probe on Linux/UNIX OS are not present (Check with ldd <probe_name> command
For example on Linux, to check for the required cdm probe libraries, run the command-> ldd cdm. In the example below there were no missing libraries.
cd to /opt/nimsoft/probes/system/cdm
# ldd cdm
linux-vdso.so.1 => (0x00007ffe9e33f000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f54b6de5000)
librt.so.1 => /lib64/librt.so.1 (0x00007f54b6bdd000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f54b69d9000)
libm.so.6 => /lib64/libm.so.6 (0x00007f54b66d7000)
libc.so.6 => /lib64/libc.so.6 (0x00007f54b630a000)
- response-type probe incorrectly configured (connection profile)
- not enough disk space on the local filesystem to write to the filesystem
- defect (hotfix may be available)
- corruption of configuration/probe db folder when and where applicable
- wrong java version in place
- probe has been deprecated (as of the UIM version you're running)
- probe cannot get a port (first_probe_port not configured to 48000 in robot.cfg, or port blocked or local firewall blocking) - Check the controller log and/or local remote firewalls.
- not enough resources, e.g., handles/threads exhaustion
- hub-robot pairing (should be the same version)
- probe needs a cold start or restart
- Robot name or IP was changed
- probe needs Security validation (which can be automated via nas AO LUA script using pu commands)
- robot cycled/system rebooted, UIM upgraded, etc., and required order of probe startup/probe dependencies has been violated. See controller.cfg and check for 'start after'
- AS400/iSeries systems - (DNS resolution issue or incorrect domain name settings on the AS400 OS)
Normally, it is best to open a support case for this issue on a 'case-by-case' basis and attach the given probe logs at loglevel 5, use logsize 5000, and describe the UIM environment and probe versions generating the alert.
Note that one of the most common reasons listed above for a probe 'failed to start' alarm is anti-virus:
Here is an example and solution explained in a tech tip.
Tech Tip: UIM "Probe 'cdm' FAILED to start (command = cdm.exe) error = (5) Access is denied"
The controller.log may reveal such an issue or a check of the Windows event log (Application/System).
Normally, the root cause can be revealed in the controller logs and/or event logs on the robot machine.