Receiving some exchange_moitor probe alerts with a message like:
exchange_monitor communication error. (services_check): Find services status - communication error
Potential causes include:
- Network connectivity issues
- Large niscache on the robot
- UIM 8.x or higher
- exchange_monitor v5.40
- robot v9.33
Check if you are getting any loss of functionality when you get this alert e.g., loss of QOS etc.
How soon does this alert clear?
It could be that the server hosting the probe is busy at that time. If there is no loss of functionality you could block this alert using a nas pre-processing rule.
That said, exchange_monitor uses four other probes to monitor Exchange:
1) The first thing to check is that the most current probes are deployed and running.
2) Is this a one-time alarm or is it happening frequently?
3) If the alarm is occurring frequently, then support will need to see level 3 exchange_monitor logs (when the services check fails), as well as the exchange_monitor config file to dig deeper into the issue.
***The above comments aside, if similar communication issues are happening with other probes on a few robots then you should check for potential communication issues/network connectivity.***
Try doing a telnet FROM the hub TO the robot on port 48000 and telnet FROM the robot TO the hub on port 48002. If telnet-ing from the robot to the hub fails try a tracert TO the hub from the robot and see if its successful/slow etc.
You can also check the transfer rate in the Hub probe GUI under the Hubs Tab, via rt-click.
There is no 'magic number' but you should have at least a 200 KB/sec transfer rate or higher for the hub in question.
4. You can try setting the hostname and IP explicitly in the robot(s) config via the GUI if you can open it or via Raw Configure or in the robot.cfg file itself if you have access and then restart the robot and see if it behaves better afterwards.
5. In some cases this error can be caused by Anti-Virus/network filtering interfering with the communication. If that is the case, configure an exception for All Nimsoft Programs/Nimsoft directory.
6. Lastly but not least, clear the niscache on the robot where the exchange_monitor probe is deployed:
How to clear the niscache on an individual robot
Then after the robot is restarted and the alarm has been acknowledged, check for any reoccurrence of the alarm.