Basic troubleshooting steps for all UIM probes

book

Article ID: 221769

calendar_today

Updated On:

Products

CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management for z Systems CA Unified Infrastructure Management SaaS (Nimsoft / UIM) NIMSOFT PROBES DX Infrastructure Management

Issue/Introduction

Please provide basic troubleshooting for all UIM/Nimsoft probes and details for troubleshooting steps that should be taken.

Here is a list of some of the symptoms you may notice when a probe is in need of troubleshooting.
 
- Max. restarts errors in probe log
- Probe failed to start alarms
- Multiple probes fail to start
- Probe process not starting
- Probe is unresponsive or hung
- Probe is red and has no port or PID
- Probe will not open/cannot be configured
- Probe is green but not behaving/running normally

Cause

- There can be many different causes that prompt the need for probe troubleshooting

Environment

Release: UIM 20.1 or higher

Resolution

There are some good KB articles for troubleshooting individual probes, but currently we do not provide a KB Article on basic troubleshooting for each and every UIM probe. New KB Articles are being added to the Knowledge Base over time. In the meantime, we recommend the probe troubleshooting steps described below.
 
 

 

Search the Web

Enter the exact error such as-> Max. restarts reached for probe 'ntevl' (command = ntevl.exe), or enter the product name, probe name, and the specific error.
 

Search UIM Documentation (Tech Docs)

The tech docs are the first place you should look for how to troubleshoot a given probe, for instance here is an example for sngtw:
 
In some cases, this type of information has been included in the Help docs (tech docs), another example is ppm:
 

Search the probe technical documentation

It is also highly recommended to review the probe prerequisites and the probe compatibility matrix to make sure the environment is in support of the probe, for example, for the ntevl probe, reference the probe tech doc/release notes and check its requirements for hardware and especially software):
 
Depending on the type of probe, the techdocs may also contain the probe requirements for connectivity, access/permissions.
 

Search the DX Infrastructure Manager (UIM) Community

You can search the community by entering your search terms, probe name, errors, error string, etc.
 

Probe Version

Make sure you're using the latest GA version of the probe which is downloadable from http://support.nimsoft.com
 

Probe Hotfixes

If you encounter any issues/errors, navigate to the UIM hotfix site and check for the latest hotfix version which may resolve the issue - check the associated probes' release notes for more information.
 

Help Documentation (Probe techdocs)

Normally, you will see a Contents section listed like this in the probe techdocs which also includes 'Known issues and Workarounds,' which is in this example below, for cdm: 
 

 

Probes Support Matrix

 
The UIM probes support matrix will highlight what OS/versions are currently supported:
 
 

 

UIM probes - Basic Troubleshooting Steps

In general, the following steps are required to perform basic troubleshooting for any UIM probe.

Step 1: Check and Test Probe Health

  1. In IM or the Admin Console, check to make sure that the probe is green and has a port and PID.
  2. Check the probe log to see if it is writing to the logfile or not, and if not, it might be in a hung  state.
  3. Try a restart of the probe to see if will start writing to the log and starts responding.
  4. When a probe is not starting, appears red or hung, you may find that ~50% of the time the following two steps resolves those issues:
    RT-click the probe -> Security -> Set access -> Click OK
    RT-click the probe -> Security -> Validate -> Select 'Yes All'

Step 2: Enable Debug Mode

  1. Select the probe
  2. Hold down the SHIFT key and RT-click and open the probe in 'Raw Configure' mode
  3. Under the probe <setup> section, set the probe loglevel to 5
  4. Add a logsize key if not present but set the value to 100000 (it's in Kb), if the key already exists
  5. Deactivate the probe and wait until the probe loses its port and PID and the probe icon turns grey
  6. Activate the probe

Note that some probes such as the hub and controller can be set to loglevel 6 for more detail. This may yield more debug level information in the event that the probe logs do not include enough detail.

Step 3: Reproduce and capture the issue

  1. Reproduce the issue and let the probe run through its monitoring interval at least once to cause the issue, capture the error and document the time frame in which it occurred.
  2. Examine the output in the probe logs using the IM or Admin Console or you can view the FULL logs on the filesystem itself, preferably using Notepad++ editor.     
  3. In IM when viewing the log, press the F4 key and enter either "error" or "exception" or "failed" or "fail" or "OutofMemory" or some other helpful string. This will highlight the entry in red making it easier to notice in the log in the log viewer.
  4. If you believe you found the key error/exception, perform a search of the KB Article database and/or conduct a web search.
  5. If the probe only throws a "Max. restarts" error in the log, redeploy or update the probe to see if you can force more information into the log.
  6. Attach the probe logs to the case, e.g., <probe_name>.log and <_probe_name>.log
  7. Attach the <probe_name>.cfg file to the case

The probe .cfg and log files are usually located in the following file system location unless a customer has installed a robot/probe in a different location:

Windows:  <drive>:\Program Files (x86)\Nimsoft\probes\<probe_category>\<probe_name>,

For example-> C:\Program Files (x86)\Nimsoft\probes\system\cdm

UNIX/Linux: /opt/Nimsoft/

For example: /opt/nimsoft/probes/system/cdm

If the problem seems to be more difficult to analyze, it is best to attach the entire probe folder to the support case.

Step 4: Check Probe System Environment

  1. On Windows, examine the Event Log for ANY type of error/failure/crash in the Application and/or System Log. Note that some AV products dump alarms in the Informational category even though they are interfering with the startup or execution of probe or process.
  2. On Linux/UNIX machines there may be a core dump file. However, note that core dumps may not currently be enabled on the OS. 
    Ask the Linux/UNIX Systems Administrator if core dumps are enabled. See: https://access.redhat.com/solutions/4896
  3. Set the robot (controller) loglevel to 5 (or 6) and logsize to 100000 in the robot.cfg and restart the robot.
  4. Check the controller.log for any errors related to the probe that is not starting.
  5. Is the issue isolated to one probe? If the probe is a Java probe, check to see if other java probes are behaving the same way. If so, check if the java_jre package was recently updated on the robot and that it’s the correct JAVA version as per the probe requirements.

Step 5: Run the probe manually

  1. Run the probe manually from the command line to see if more descriptive errors occur.
    Please refer to the Additional Information section below on how to run the probe manually from the command line for Windows or Linux/Unix systems

 

Probe Troubleshooting Videos

There are also some videos available on how to troubleshoot specific UIM probes.
 
 

Additional Information

In some cases, it is helpful to run the probe manually to see what's preventing it from starting or causing it to crash, such as missing libs/dependencies, anti-virus, etc.
 
 
 
Other Articles related to probe troubleshooting
 
 
Customer guidance on opening a support case

If the issue persists, please open a case at support.broadcom.com or click on this link:
DX Infrastructure Management

Providing the information listed below will help to expedite your case and will help the support engineer/development to reach a speedier resolution.
 
- UIM Version
- Probe name and version that is having the issue.
- Detailed description of the problem
- Any error messages that show up before, during or after the issue occurs. 
- Detailed steps on how to reproduce the issue. 
- Identify any changes that were undertaken before the issue occurred, e.g., probe upgrade, UIM upgrade, reboot, security/network changes.
- How long the issue has been occurring and when it started
- How many devices/users are affected
- Upload the .cfg files for the affected probe(s)
- Probe log files, errors, screen shots or short recording of the problem

Attachments