We have a few (9) devices that had their polling dropped for less than 1 minute because of a network hiccup. Spectrum noticed this as well.
Because of this, Device Polling Statistics, Rule Name: Data Collector Dropped Poll Request, alarms are generated for those 9 devices.
However the issue only lasted for a few seconds, polling does not resume in PM (no problems in spectrum). The devices have "management agent lost" status in PM. This is incorrect as polling works fine.
We've seen this issue mulitple times in the past. The only resolution is to stop and start polling manually for the devices. Sometimes these events go unnoticed for days which causes huge data gaps.
snmpwalk on cli shows that device works fine.
Release : 22.2
FW blocks the responses
The issue was that the connections were already present in the connection table and a policy push was performed for that VSX FireWall.
The connection persistency for that VSX FireWall is set to “Rematch connections”.
As the DC server did not reinitiate the connections there was no rematch.
For each new connection, the FireWall will evaluate the flow against our policy.
If the flow is allowed, it will be stored in a connection table.
Connections that are listed in the connection table do not require to be rematched against the Policy.
By default, without keepalive, that connection will have a TTL in the connection table of 3600 seconds (1hour).
The SpectroServers do poll every 5 minutes if I remember correctly, indefinitely resetting the TTL for the connection. (the new polls keep the connection alive and there is no need for keepalives to be sent in order to maintain the connection).
Now the FireWall that handles that traffic is set to rematch every connection against the policy should a new policy be pushed to the FireWall. (Persistency policy).
Only connections that go through the policy decision making may be allowed through the FireWall.
As the SNMP polling connection was still present in the connection table it was still “accepted” but dropped down the line as it was not rematched.
To allow the flow to be rematched against the policy I had to kill the sessions that were present in the connection table.
To avoid such incidents to reoccur I cloned the services for the flow and used an option to override the Persistency.
SNMP polls will no more required to be rematched against the policy should a new policy be pushed on that FireWall even if the Persistency would require a rematch against the policy.