False alarms Connection to <host/FQDN> failed (ping) failed, coming from a decommissioned robot
search cancel

False alarms Connection to <host/FQDN> failed (ping) failed, coming from a decommissioned robot

book

Article ID: 252035

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) DX Operational Intelligence CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

We have removed robots from the Nimsoft Infrastructure Manager and no ping alerts are configured. But still we are getting incidents (INC) for ping failed. It seems these are false alerts.

invalid_ci: xxxxxxxx
event_source: xxxxxxxx Ticketing
packet_id: 5383a2d6-81ee-470e-9a34-5bcccb421543

Summary: Connection to <##.##.##.##> (ping) failed  (profile: <xxx.example.com> )

Date: 21/09/2022        Severity: Minor
ResourceId: #######
TicketGroup: XXX_XXX_XX_XXX        CustomerCode: xxx
InstanceSituation: xxxxxxxx
Node: xxxxxxxx
NodeAlias: ::1
Agent: xxx Probe on <hostname>
AlertKey: <XXXXXXX.example.com>:xxxxxxx:xxxxxxxx Connection to <XXXXXXX.example.com> (ping) failed  (profile: <XXXXXXX.xxx.example.com>):Env=
AlertGroup: xxxxxxxx
MonitoringSolution: EventIntegrator
EventKey: XXXXXXXXXX:##########:nnn

Environment

  • Release: 20.3
  • net_connect: Any version

Resolution

There was a false alarm from a decommissioned server:

Connection to <hostname or FQDN> (ping) failed.

The server name had been changed.

There were no current connection alarms for the decommissioned server in the nas/IM Alarm subconsole.

When viewing the nas Activity log, we could see the history and the last occurrence of the alarm occurred 2 days ago.

So if it no longer occurred, perhaps 1 or more of those connection alarms were queued up and then finally arrived at the nas.

If so, that could be the end of those alarms being queued and then reaching the nas. We also noticed a reoccurrence pattern - that the alarms occured every 3 days.

Run a select on the nas_transaction_log to search for the decommissioned server.

If there are no occurrences of the alarm in the results of the nas_transaction_log query, then the alarm is no longer being generated in the UIM environment.

Potential root causes should be checked:

    1. Check the nas_transaction_log table and take note of all of the false alarm attributes.

    2. Check to see if there is another hub/robot where net_connect is deployed and it has a net_connect profile that includes the decommissioned robot name.

    3. If the ticket generated in the ticketing system does not include the hub origin name, try to include it within the ticket so that the real source of the alarm can be traced/identified.

    4. Check if a nas script may be creating these false alarms - potential nas Auto Operator profiles running a script should be reviewed by the script creator

    5. Check if the server included in the alarm is configured in net_connect on a different hub or robot in another UIM domain/environment.

 

In this particular case, the alarms were being generated by a net_connect probe instance (profile) which was within another UIM environment.

The alarm message text contained the name of the decommissioned server.

Therefore, the customer disabled the profile/ping alerts coming from the other UIM domain/environment.