net_connect alarms: Profile failed to execute in scheduled time interval

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

net_connect alarm messages: Profile failed to execute in scheduled time interval
Hundreds of alarms could be generated in a seemingly random fashion or after probe restart.

Environment

net_connect probe 2.93 or higher

Cause

The defined time interval is not long enough to execute all the defined monitoring profiles
This alarm was added to the probe in version 2.93
net_connect probe may be overworked due to the number of profiles

Resolution

Follow the steps in the net_connect probe troubleshooting steps to mitigate this issue. The problem occurs when the monitoring interval is insufficient to execute all monitoring profiles.

Reduce the number of monitoring profiles configured in net_connect on the robot.
Change the net_connect probe monitoring parameters to reduce the processing time of each monitoring profile. Some examples to reduce the processing time of a monitoring profile are as follows:
- Reduce the number of Retry attempts to monitor the ICMP connectivity of the host. Broadcom recommends a value of 3. However, if the problem continues, decrease the value to 1.
- Increase the Max Ping Threads value if sufficient machine resources are available.
- Reduce the number of ICMP burst messages. Broadcom recommends a value of 3. However, if the problem continues, decrease the value to 1.
- Reduce the timeout value of the monitoring profile. Broadcom recommends a value of 2 seconds.
- Increase the time (monitoring) interval of the profile. Broadcom recommends a minimum value of 5 minutes.

Note that these recommended values are tested with 6500 profiles on Windows and 5048 profiles on Unix/Linux platforms.
However, the average response time for profiles must be up to 10 ms. For failure scenarios, response time must be up to 10 percent. If the profile number, average response time, and number of fail scenarios increase, you can modify the recommended monitoring parameters.

Message Filtering

It is also possible to either 'filter out' or reduce the severity of the alarms.

Reducing alarm severity

v3.03 or higher

- use the message pool manager in the probe GUI config and change the severity of the alarm to the desired level

2.93 or higher

- create a nas preprocessing rule that uses a script to alter the severity

event.level = 2
event.message = "Profile failed to execute in scheduled time interval"
return event

The nas pre-processing rule 'Message string' should be defined as:

Profile * failed to execute in scheduled time interval

Filtering out the alarms

Create a nas preprocessing rule with the following message filter via the nas GUI and then enable it, and test it by sending a test alarm via the nas Status Tab via rt-click Send test alarm:

/.*Profile failed to execute.*/

Additional Information

Check and document how many profiles, e.g., hosts/devices are being pinged/monitored.
Check the size of the net_connect.cfg file
On Windows, check Windows events viewer (Application/System) logs for any net_connect crashes
Try splitting up the profiles across 2 instances of the probe instead of running all of the profiles from one robot with net_connect installed on it.