Problem:
Errors in the Data Collector karaf.log file.
Frequent appearance of "Exception while processing Snmp4j response" error messages in the Data Collector karaf.log file.
Question:
Why are there repeated error exception messages in the Data Collector karaf.log?
How do I resolve repeated error exception messages in the Data Collector karaf.log?
What do "Exception while processing Snmp4j response" error messages in the Data Collector karaf.log file mean?
Answer:
This will address a specific error message in the karaf.log file found on the Data Collector polling devices in the network. This file is found in the following directory:
/opt is the default installation home. If a different one was selected please insert it into the path above as needed.
There are various possible causes for this type of error. This solution discusses just one, in this case a problem querying the sysUpTime.0 MIB OID, but it can be used to determine the cause for other polling issues that generate the same type of message.
An example of a full error message involved here from the DC karaf.log is:
2015-11-18 16:45:07,935 | ERROR | ecutor-thread-18 | SnmpScheduledPollRequest | r.AbstractPollResponseListener$1 174 | 198 - com.ca.im.data-collection-manager.snmp - 2.5.0.RELEASE-285 | | Exception while processing Snmp4j response for pollGroupId=297 while parsing responseEvent=ResponseEvent [source=Snmp, address=127.0.0.1/161, request=GET[requestID=16029, errorStatus=Success(0), errorIndex=50, VBS[1.3.6.1.2.1.1.3.0 = Null]], response=RESPONSE[requestID=16029, errorStatus=Success(0), errorIndex=0, VBS[1.3.6.1.2.1.1.3.0 = noSuchObject]], userObject=ItemBasedRequestState[responseReceivedTimestamp=1447883107934, nextIndex=1, itemList=[170813]]
, error=null]. Exception: Index: 0, Size: 0
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
<<<more java "at..." messages here cut for brevity>>>
at java.lang.Thread.run(Unknown Source)[:1.7.0_67]
In the environment where this was observed there were 3837 instances of that message found in the karaf.log on the Data Collector. This frequency, and the rate at which it can fill up the log when many devices are experiencing a similar problem, is what brought it to the concern of the systems administrator.
When we extracted the messages, analyzing them as a whole, the IP addresses from the "address=" field can be listed out. In this case we found these that were a total of only 6 individual IP addresses involved in generating nearly 4000 log messages in a short period of time. We found this was related to a variety of different network devices, including Routers, Switches and Servers. This problem can be seen for any SNMP device that should provide a valid sysUpTime.0 MIB OID value in response to requests.
Breaking the error message down to its finer details, it tells us the following:
This response means the MIB object queried isn't present in the MIB. We're not getting back a bad or unexpected value, it is as if the MIB OID doesn't exist at all according to the device polled.
If we run the following snmpget command in a terminal window on the Data Collector polling the device what is the response?
For example, if the community string is "public", the command might be:
** Refer to the man page for the snmpget command for additional options for its use, such as SNMPv2c or SNMPv3, or specifying ports other than 161 for use. **
If the response is anything other than the correct sysUpTime.0 value, please have the device administrator, or the vendor involved, help determine and resolve the incorrect response. Once that is resolved the errors should no longer be observed in the log file.
If the response is the correct sysUpTime.0 value for the device, there is something wrong with the CA Performance Manager environment. Please open a new Support Case on the support.ca.com website.