After starting the SMARTS APM (Availability and Performance Manager) service on a VM running, devices begin showing ICMP packet loss.
SMARTS generates multiple false "Device Unresponsive" or "Device Down" alarms.
When the APM service is stopped, manual pings to the same devices show no packet loss.
The issue is reproducible in new test environments with identical configurations.
All supported Smarts releases
While the symptoms appear linked to the SMARTS service activity, the root cause is often identified as an infrastructure or network bottleneck.
In monitored cases, live traceroute analysis while the service is active has shown that specific network hops within the infrastructure fail to respond under the increased polling load, resulting in dropped packets.
To isolate and resolve the issue, follow these diagnostic steps:
Verify Service Impact: Confirm that packet loss only occurs when the APM service or IP domain is active. Stop the service and perform manual pings to establish a baseline.
Isolate Environment: Create a test VM with the same OS and SMARTS version to confirm if the behavior is environmental or specific to the production host.
Live Traceroute Analysis: While the APM service is running and packet loss is observed, perform a traceroute -I <device_ip> to the affected devices.
Identify Non-Responsive Hops: Look for specific hops in the traceroute that show no response or high latency.
Consult Network Team: Provide the traceroute data to the network infrastructure team to investigate suspected hardware or configuration issues at the specific non-responsive hop.
File Descriptors: Ensure the system limit for open file descriptors is appropriately tuned for SMARTS polling requirements, though infrastructure investigation should remain the priority if packet loss occurs at specific hops.