We monitor servers through RSP and even if everything we monitor is 'OK,' the rsp profile nodes/hosts remain red and we don't know why. I'm expecting to see the node "green" if there's no issue with the monitoring.
- configuration/probe fine-tuning
- resource utilization on the robot running rsp
First, deactivate a subset of profiles to see if the nodes remain green, e.g., 20-25% of the profiles or offload those profiles to another rsp probe instance on another robot to 'lighten the load.'
Normally, the probe can handle approx. 200 profiles. The probe may be able to handle up to 300 or much more but it depends on a number of factors.
Normally, you may set the following in the rsp.cfg and it will help:
- Max data collection threads to 300
- Running threads Exit timeout to 180
- WMI Retry Delay set to 1200
- WMI Connection timeout set to 600
- SSH Connection Timeout to 30
- Command timeout to 120
Then, if a single or small or random set of profiles/machines fails intermittently, try deactivating or deleting those machines from the rsp instance where they are failing and deploy another rsp probe instance and add them to that instance to distribute the rsp profile processing. Cold start (Deactivate-Activate) rsp and test again.
In general, after testing a few rsp probe instances and determining the best configuration that ensures seamless profile processing, you can distribute an rsp probe package to change the settings on all machines where rsp is installed and keep the number of profiles below the number which causes the nodes to turn red.
Listed below are the factors associated with how many profiles the rsp probe can run and still collect data without constant failure, and where the nodes will remain green.
In some cases, when the number of metrics and elements being monitored is similar/consistent, the nodes may remain green, with hundreds of profiles. In other cases, less profiles would remain successfully monitored, e.g., up to 100, due to the demand on resources required to monitor more cpus, disks, processes, and events.
- Note that its possible to improve performance and scalability by adding more virtual processors if using a VM, e.g., start by adding 2 more and test with the same number of initial profiles.