search cancel

Monitoring Netapp storage which is hosted on Azure and cdm shared disks throw errors

book

Article ID: 256499

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

We are facing issue in monitoring the Netapp shares, for a few shares (shared drives) we are receiving network connection errors intermittently, where the network connection is good.

The monitoring server is located in Azure.

Please help us to solve the issue to monitor Netapp storage which is hosted in Azure.

Environment

  • Release: UIM 20.3.3
  • cdm v6.70 or higher

Cause

- Most likely network routing/latency

Resolution

The issue (Netapp shares/shared drives connection errors) are transient and proven to be very short-lived as per the cdm logs.

The focus needs to be on network testing and/or doing a wireshark trace when the issue occurs, and then look for e.g., temporary network latency or routing for instance.

Regarding the possibility to monitor Azure-hosted storage directly - as far as we know it's never been tested that way so probably not, due to the layers of virtualization involved.

Instructed to try and catch the issue when it happens or try to recreate the scenario since it happens quite randomly/intermittently.

If an alarm is generated, the customer can use the nas to EMAIL when it occurs, or use logmon to monitor the log file and parse it for the connectivity error using a sub-string, e.g., /.*<errrorstring>.*/

and then generate an alarm. Then send an email to yourself to alert on the occurrence and then start Wireshark trace with Network team involved if possible.


As an alternative, since the connectivity loss occurs but recovers very quickly, you can use a nas preprocessing rule to manage the alarms.

You should be able to setup a nas pre-processing exclude (delete alarm before it hits the queue), rule using this as a 'Message string' REGEX with an AND operator for 3 strings, e.g., 

   /(.*/disk/alarm/connections/.*)(.*\\xxxxxxxxxx-dcda.ad.xxxxxx.org.*)(.*error.*)/

Test the alarm by sending it via the nas Status Tab window via Rt-click->Send a test alarm

   cdm: InitAlarm - /disk/alarm/connections/\\xxxxxxxxxx-dcda.ad.xxxxxx.org\xxxxxxxxxx/error

You can view the alarm before you enable the nas preprocessing rule just to make sure it is working:

 
Then after you enable the nas preprocessing rule, simply send the alarm again-> cdm: InitAlarm - /disk/alarm/connections/\\xxxxxxxxxx-dcda.ad.xxxxxx.org\xxxxxxxxxxxxxxxx/error
and the alarm(s) will not show up anymore as it will be dropped. You can do this for one or more alarms.

Additional Information

Regex 'AND' operator format / syntax is:

/(.*<string1>.*)(.*<string2>.*)(.*<string3>.*)/

Attachments