Spectrum displays a SPM timeout error message when the test runs correctly on the Cisco device
Release: Any
Component:
Steps to troubleshoot an SPM ICMP Test:
Does the test run correctly from the Cisco device?
Does the device answer correctly to an SNMP request from the SPM test?
Are the results from the SPM test available in the MIB and if so does the MIB answer the device incorrectly?
If all of the above are true, then the problem may be caused by a known Cisco bug.
How to determine if the SPM test runs correctly
In this example we will use 22397 as the IP SLA entry related to SPM test defined in Spectrum for an icmp-echo against device xx.xx.xx.xx
Run the command #sh ip sla conf 22397 and see the following:
Round Trip Time (RTT) for Index 22397
Latest RTT: 36 milliseconds
Latest operation start time: xx.xx.xx.xx GMT Mon Nov 21 2011
Latest operation return code: OK
Over thresholds occurred: FALSE
Number of successes: 15
Number of failures: 0
Operation time to live: Forever
Operational state of entry: Active
Last time this entry was reset: Never
Notice that the test seems to be working correctly, we have:
Number of successes: 15
even though Spectrum shows a timeout.
Where does Spectrum receive the data to determine if the test ran correctly?
Spectrum uses the following info in the MIB to analyze SPM test results:
1.3.6.1.4.1.9.9.42.1.3.1.1.5.instance.timestamp.1.1.1
1.3.6.1.4.1.9.9.42.1.3.1.1.11.instance.timestamp.1.1.1
1.3.6.1.4.1.9.9.42.1.3.1.1.7.instance.timestamp.1.1.1
1.3.6.1.4.1.9.9.42.1.3.1.1.10.instance.timestamp.1.1.1
The first OID 1.3.6.1.4.1.9.9.42.1.3.1.1.5 is:
rttMonStatsCaptureCompletions
With a successful read from rttMonStatsCaptureCompletions Spectrum will then try and read the following results with the index:
rttMonStatsCaptureCompletionTimeMin
rttMonStatsCaptureSumCompletionTime
rttMonStatsCaptureCompletionTimeMax
The results from the index and the reads will let Spectrum know if a threshold has been exceeded.
In this example the instance will be 22397
The timestamp is shown as 760482609
Response, reqid 77917714, errstat 2, erridx 1
rttMonStatsCaptureCompletionTimeMin.22397.760482609.1.1.1 = NULL TYPE/VALUE
rttMonStatsCaptureSumCompletionTime.22397.760482609.1.1.1 = NULL TYPE/VALUE
rttMonStatsCaptureCompletionTimeMax.22397.760482609.1.1.1 = NULL TYPE/VALUE
Analyzing sniffer trace results we see in reqid 77917714 that Spectrum is querying for:
rttMonStatsCaptureCompletionTimeMin.22397.760482609.1.1.1
rttMonStatsCaptureSumCompletionTime.22397.760482609.1.1.1
rttMonStatsCaptureCompletionTimeMax.22397.760482609.1.1.1
and received a NO_SUCH_NAME answer as the results were NULL TYPE/VALUE.
When using mibtools or sapwalk2 the info could be correctly seen populated in the MIBs.
The problem is even though the information required by Spectrum is in the MIBs the Cisco device is answering incorrectly with NO_SUCH_NAME, meaning that the data is not available in the MIB.
In this scenario, this is a know bug with Cisco devices.
Cisco has acknowledged this as a bug (CSCsl97612:rttMonStats tables returns NULL values for snmpget) and has recommended to upgrade the IOS to 12.4(15)T or 12.4(15)T16.