zEvent probe randomly missing messages from z/OS
search cancel

zEvent probe randomly missing messages from z/OS

book

Article ID: 278333

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

A little over two years ago we moved away from Common Services CCI as a method to integrate UIM and Mainframe.  CCS Event Mgmt 15.0 was used on the mainframe side, and we deployed the zEvent v1.0 probe on the UIM side to connect to it.  This has worked fairly well over the last couple of years, however, support was dropped for the probe for some reason.  This is a critical component of our overall monitoring solution, and I'm not sure what the status currently is for getting any kind of support for the integration.  From what I understand, the software on the mainframe side that the probe is connecting to is still fully supported.  Just need to get some support here, or a new supported solution.

It could be any CA-7 Host or X-platform job failing for any application the job is trying to kick off, which is a wide range.  What we’re showing is the message appears to be dropped on the queue on the mainframe side, but the zevent probe has no information about the message even though it’s logging is in debug.

Environment

Environment:

  • DX UIM version and CU:  20.4 CU6
  • Primary Hub OS: Windows 2019
  • Primary hub and robot version:  9.37
  • Hub version that the robot reports to:  9.37
  • zevent running on the Primary or another hub?  Primary
  • nas version (on Primary hub): 9.37
  • zevent probe version: 1.00
  • ems probe version:  10.31
  • Message Service Server (MSS) is deployed and running on zos
  • The HUB is deployed and running on zos
  • The ems probe is deployed and running on the same robot where zevent probe is deployed

Resolution

  • The customer disabled a nas Interval-based AO profile that was hammering the nas.exe process.

  • When there is a flood of alarms and the alarm load increases dramatically and very quickly, this can cause unexpected outcomes in the nas. Sometimes scripts may do this, too many rules on the Primary hub nas, too many rules with aggressive regexes defined, etc. may also be the cause.

  • 11k+ AO actions per hour may cause a significant delay in the data being written to the nas_alarms and transaction tables.

  • Highly recommend also reading and reviewing the nas best practices guide including architectural considerations.

Additional Information

zEvent probe: Increased java memory to 2048/4096 respectively. 

ems probe: increased java memory to 2048/4096 as well since it processes the events.

Deactivate then Activate zEvent.