Best Practices for Alert Management

Products

CMDB for z/OS NetSpy Network Performance NetMaster Network Automation SOLVE NetMaster Network Management for SNA NetMaster Network Management for TCP/IP NetMaster File Transfer Management SOLVE:Operations Automation SOLVE:Access Session Management SOLVE:FTS

Issue/Introduction

Environment

Release:
Component: NMTIP

Resolution

Decide how you want to find out about faults/alerts.

You can:

Watch the Alert Monitor display itself, which dynamically shows alerts being created, updated and cleared. From here you can:
- close alerts
- change alert severities
- raise trouble tickets from alerts
- customise the display to your liking
Use the IP Node Monitor and the IP Resource Monitor, and when you see a resource has alerts, use the AL (Alerts) command to display just those alerts
Send alerts to an external product, and watch for them there, and/or:
Send alert details
- via email
- to an NCL procedure
  (which can call a REXX procedure with the details)
- via a TSO or NetMaster broadcast message
- to a WTO (to be picked up by other monitors)

Some NetMaster users watch the Alert Monitor display all day, and alerts drive the work of their department.

Some NetMaster users never look at the Alert Monitor, and just use it to send alert notifications to other places.

Customizing the Alert Display

3270

See "Alert Monitor Display Format" in Chapter 7, "Setting Up the Alert Monitor", in the Administration Guide.

WebCenter

Use the Sort, Options and Filter buttons to modify the display. From the resulting dialogue, use Save Settings.

Alert History

The NetMaster Alert Monitor can optionally also keep Alert History data. This consists of details of all alerts, for the last N days. Alert History can be useful to review and isolate recurring problem areas.

Where do alerts come from?

Alert source	Description of alerts
IP Node Monitor IP Resource Monitor	IP performance attribute alerts NetMaster for TCP/IP samples specified attributes of IP logical and physical resources, at regular intervals. When an individual sample value activates a specified trigger or alert condition , an alert is raised. Performance attribute alerts can be these types: Enumerated (text) attributes are equal to a certain value, for example NETSTATUS=TIMEOUT Numeric attributes are above or below a specified constant threshold, for example NoOfHops > 7 Numeric attributes differ from a baseline (moving average) by more than a specified percentage e.g. BytesIn for TCP port 123 is more than 50% over its average, for this hour of week Performance attribute alerts are automatically closed when the alert condition clears.
IP Event Detectors	IP event alerts A single event occurs, and an alert is raised. Such events include: A particular port listener becomes inactive A particular ICMP message is issued An FTP transfer matching specified criteria fails A particular z/OS console message is issued The z/OS TCP/IP interface changes state A particular Cisco Channel TN3270 event occurs Some but not all IP event alerts can be automatically closed when a corresponding OK event occurs.
NetMaster for SNA	SNA PPO event alerts
NetMaster File Transfer Management	File Transfer Management Alerts Many different actual and potential error conditions, including: file transfers were late, or look like they might be late a file transfer infrastructure component has a problem a file transfer throughput rate was abnormal a file transfer that matched specified rules has occurred
CA SOLVE:Operations	z/OS system automation alerts
User-written NCL procedures	User-defined alerts Any user-written NCL procedure can raise and clear alerts, using supplied NCL API calls.
NetMaster internals	NetMaster health alerts Serious conditions affecting the operation of the NetMaster region, such as VSAM errors on critical files
CA OPS/MVS	CA OPS/MVS-specific alerts

Filter alerts

Step 1 Decide how to group your alerts

If you want to use the Alert Monitor to send alert details to other products, consider which of your likely alerts belong together, and should be sent to the same place(s).

You might want to send every alert to the same place; or, you can do things like send every SNA alert to email address A, and send every IP alert to Spectrum, and write a WTO about certain FTP failures, and pass details of certain IP connections to NCL procedure N.

Whenever you do any configuration that can generate alerts - IP Event Detectors, performance monitoring attributes - decide if you want any resulting alerts to be sent, and where/who to.

Use this information to decide what alert filters you need.

What uses Alert Filters?

Specify an Alert Filter name when you define an Alert Forwarding Destination, to restrict the alerts that will be sent to that destination
Specify an Alert Filter name when you watch the Alert Monitor, to restrict the alerts that are displayed

Step 2 Define Alert Filters

To define an Alert Filter

Use /ALFILT (A.A.F)

More information:

Chapter 7, "Setting Up the Alert Monitor", in the Administration Guide.

Send alert notifications

Step 3 Define Alert Forwarding Destinations

The Alert Monitor itself can automatically forward certain alerts to any/all of up to 9 separate destinations. Possible destinations are:

? Any generic SNMP manager such as CA Spectrum, IBM Tivoli NetCool, HP OpenView

? A specific product: IBM Tivoli Netview, CA NSM, CA Service Desk

When are alerts sent to a forwarding destination?

When they match the alert filter specified for a destination.

One alert may match multiple filters and be sent to multiple destinations; or it may match no filters and be sent only to unfiltered destinations (if any).

Be careful: if a destination has no filter, all alerts will be sent there.

To define an Alert Forwarding Destination

Use /PARMS (A.C.P)
then select INTERFACES $AM ALERTS (Alert Monitor Interface)

Step 4 Define Alert Trouble Ticket destination

You can define (only) one Alert Monitor Trouble Ticket destination.
Possible destinations are:

Email (via z/OS SMTP)
A user-written NCL procedure
CA Service Desk

When are alerts sent to the trouble ticket destination?

Manually - whenever someone watching the Alert Monitor does a TT command against an alert
Automatically - whenever you define and activate an IP Event Detector or performance attribute alert with an action of AUTO-TROUBLE-TICKET
(No filtering is applied, with AUTO-TROUBLE-TICKET)

To define the Alert Trouble Ticket Destination

Use /ALTTI (A.A.I) then enter an Interface Type and press F6
Use F1=Help to explain the specific fields required by that type

Keep Alert History

Step 5 Set alert history retention

The Alert History display shows all active and closed alerts, day by day, for the last N days.

A process is run at a specified time daily, to delete the expired alerts from the file.

You can keep alerts for up to 999 days. Obviously, if you keep alerts for a long time and tend to get large numbers of them, watch your file size.

To specify the name of the Alert History file, how long to keep the alerts, and when to purge them

Use /PARMS (A.C.P)
then select FILES $AM ALERTHIST (Alert History File Specification)

Using Alert History

Browse your Alert History to get a quick idea of your most problematic areas.

Access Alert History with /ALHIST.B (or F4=History from /ALERTS)
This shows all alerts - active and closed - for the current day.
Enter sort resource to group alerts for the same resource name together.
Look at which resources had the most alerts.
Enter date -1 to go to the previous day.
Look at which resources had the most alerts on that day.
Repeat date -1 to see earlier days in Alert History File.
Re-sort the display:
- by severity, to see the most serious alerts in a day (sort s)
- by description, to see the most common kind of alert in a day (sort desc)
- by elapsed, to see the longest outstanding alerts in a day (sort elap)
- by occurance, to see the most frequently recurring alerts in a day (sort occur)