Basic troubleshooting steps for all DX UIM (Nimsoft) probes
search cancel

Basic troubleshooting steps for all DX UIM (Nimsoft) probes

book

Article ID: 221769

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) Unified Infrastructure Management for Mainframe CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

Basic troubleshooting for all UIM/Nimsoft probes and details for troubleshooting steps that should be taken.

The list below describes some of the symptoms you may notice when a probe is in need of troubleshooting.
  • Max. restarts errors in the probe log
  • Probe failed to start alarms
  • Multiple probes fail to start
  • Probe process not starting
  • Probe is unresponsive or hung
  • Probe is 'down'
  • Probe is red and has no port or PID
  • Probe will not open/cannot be configured
  • Probe is green but not behaving/running normally

Environment

  • Release: UIM 20.1 or higher

Cause

  • There can be many different causes/factors that prompt the need for probe troubleshooting.

Resolution

There are some good KB articles for troubleshooting individual probes, but currently, we do not provide a specific KB Article on basic troubleshooting for each and every UIM probe. New KB Articles are being added to the Knowledge Base over time.
 
In the meantime, we recommend the probe troubleshooting steps described below.


 


Search the Web

Enter the exact error such as-> Max. restarts reached for probe 'ntevl' (command = ntevl.exe), or enter the product name, probe name, and the specific error.
 

Search UIM Documentation (Tech Docs)

The tech docs are the first place you should look for how to troubleshoot a given probe, for instance here is an example for sngtw:
 
In some cases, this type of information has been included in the Help docs (tech docs), another example is ppm:
 

Search the probe technical documentation

It is also highly recommended to review the probe prerequisites and the probe compatibility matrix to make sure the environment is in support of the probe, for example, for the ntevl probe, reference the probe tech doc/release notes and check its requirements for hardware and especially software):
 
Depending on the type of probe, the techdocs may also contain the probe requirements for connectivity, access/permissions.
 

Search the DX Infrastructure Manager (UIM) Community

You can search the community by entering your search terms, probe name, errors, error string, etc.
 

Probe Version

Make sure you're using the latest GA version of the probe which is downloadable from http://support.nimsoft.com
 

Probe Hotfixes

If you encounter any issues/errors, navigate to the UIM hotfix site and check for the latest hotfix version which may resolve the issue - check the associated probes' release notes for more information.
 

Help Documentation (Probe techdocs)

Normally, you will see a Contents section listed like this in the probe techdocs which also includes 'Known issues and Workarounds,' which is in this example below, for cdm: 
 

Probes Support Matrix

 
The UIM probes support matrix will highlight what OS/versions are currently supported:
 
 

 


UIM probes - Basic Troubleshooting Steps

In general, the following steps are required to perform basic troubleshooting for any UIM probe.

Step 1: Check and Test Probe Health

  1. In IM or the Admin Console, check to make sure that the probe is green and has a port and PID.
  2. Check the probe log to see if it is writing to the logfile or not, and if not, it might be in a hung  state.
  3. Try a restart of the probe to see if will start writing to the log and starts responding.
  4. When a probe is not starting, appears red or hung, you may find that ~50% of the time the following two steps resolves those issues:
    RT-click the probe -> Security -> Set access -> Click OK
    RT-click the probe -> Security -> Validate -> Select 'Yes All'

Step 2: Enable Debug Mode

  1. Select the probe
  2. Hold down the SHIFT key and RT-click and open the probe in 'Raw Configure' mode
  3. Under the probe <setup> section, set the probe loglevel to 5
  4. Add a logsize key if not present but set the value to 100000 (it's in Kb) - if the key already exists
  5. Deactivate the probe and wait until the probe loses its port and PID and the probe icon turns grey
  6. Activate the probe

Note that some probes such as the hub and controller can be set to loglevel 6 for more detail. This may yield more debug-level information in the event that the probe logs do not include enough detail.

Step 3: Reproduce, capture the issue, provide logs

  1. Reproduce the issue and let the probe run through its monitoring interval at least once to cause the issue, capture the error, and document the time frame in which it occurred.
  2. Examine the output in the probe logs using the IM or Admin Console or you can view the FULL logs on the filesystem itself, preferably using Notepad++ editor.     
  3. In IM when viewing the log, press the F4 key and enter either "error" or "exception" or "failed" or "fail" or "OutofMemory" or some other helpful string. This will highlight the entry in red making it easier to notice in the log in the log viewer.
  4. If you believe you found the key error/exception, perform a search of the KB Article database and/or conduct a web search.
  5. If the probe only throws a "Max. restarts" error in the log, redeploy or update the probe to see if you can force more information into the log.
  6. Attach the probe logs to the support case, e.g., <probe_name>.log and <_probe_name>.log
  7. Attach the <probe_name>.cfg file to the case

The probe .cfg and log files are usually located in the following file system location unless a customer has installed a robot/probe in a different location:

Windows: <drive>:\Program Files (x86)\Nimsoft\probes\<probe_category>\<probe_name>
For example-> C:\Program Files (x86)\Nimsoft\probes\system\cdm

UNIX/Linux: /opt/Nimsoft/
For example: /opt/nimsoft/probes/system/cdm

If the problem seems to be more difficult to analyze, it is best to attach the entire probe folder to the support case.

Step 4: Check Probe System Environment

  1. On Windows, examine the Event Log for ANY type of error/failure/crash in the Application and/or System Log. Note that some AV products dump alarms in the Informational category even though they are interfering with the startup or execution of probe or process.
  2. On Linux/UNIX machines there may be a core dump file. However, note that core dumps may not currently be enabled on the OS. 
    Ask the Linux/UNIX Systems Administrator if core dumps are enabled. See: https://access.redhat.com/solutions/4896
  3. Set the robot (controller) loglevel to 5 (or 6) and logsize to 100000 in the robot.cfg and restart the robot.
  4. Check the controller.log for any errors related to the probe that is not starting.
  5. Is the issue isolated to one probe? If the probe is a Java probe, check to see if other java probes are behaving the same way. If so, check if the java_jre package was recently updated on the robot and that it’s the correct JAVA version as per the probe requirements.
  6. Run the ldd command (Linux/Unix systems)
    On Linux/Unix systems run the ldd command on the probe in the probe directory, e.g., ldd <probe_name> to see if there are any missing libraries.
    as an example, when trying to run the cluster probe on a Linux OS, if the probe cannot start, you can run the ldd command and examine the output. In the output below, there are no missing libraries in the output.

    # ldd nimcluster
            linux-vdso.so.1 (0x00007ffc51d73000)
            libdl.so.2 => /lib64/libdl.so.2 (0x00007f7c7d5e7000)
            librt.so.1 => /lib64/librt.so.1 (0x00007f7c7d5e2000)
            libm.so.6 => /lib64/libm.so.6 (0x00007f7c7d507000)
            libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7c7d502000)
            libc.so.6 => /lib64/libc.so.6 (0x00007f7c7d2f9000)
            /lib64/ld-linux-x86-64.so.2 (0x00007f7c7d5f4000)
    [cluster]#

Step 5: Run the probe manually

  1. Run the probe manually from the command line to see if more descriptive errors occur.
    Please refer to the 'Additional Information' section below on how to run the probe manually from the command line for Windows or Linux/Unix systems

 


Troubleshooting Components

Troubleshooting CORE Components

 
 
 
 
 
 
 

Troubleshooting Key Components, Communication or Connectivity

 

Troubleshooting Hub Tunnels

Troubleshooting Hub Issues

Troubleshooting Hub Queues

Troubleshooting LDAP Authentication Issues

 


Monitoring Probes - Troubleshooting Videos

There are also videos available on how to troubleshoot specific UIM probes.
 
 
 



Additional Information

 
In some cases, it is helpful to run the probe manually to see what's preventing it from starting or causing it to crash, such as missing libs/dependencies, anti-virus, etc.
 
 
 

Customer guidance on opening a support case

If the issue persists, please open a case at support.broadcom.com or click on this link:

DX Infrastructure Management

Providing the information listed below will help to expedite your case and will help the support engineer/development to reach a speedier resolution.

  • DX UIM Version and cumulative update (CU) version

  • Probe name and version that is having the issue

  • Detailed description of the problem

  • Business impact of the issue

  • Any error messages that show up/display before, during, or right after the issue occurs

  • Detailed steps on how to reproduce the issue (if tested and/or reproduced)

  • List any/all changes that were implemented before the issue occurred, e.g., probe upgrade, UIM upgrade, reboot, security/network changes, etc.

  • How long the issue has been occurring and approximately when it started

  • How many devices/users are affected

  • Upload the .cfg files for the affected probe(s)

  • Upload <probe_name>.log's, <probe_name>.cfg file, errors, screenshots, an/or a short recording of the problem