SysEDGE suddenly stopped working on multiple servers
search cancel

SysEDGE suddenly stopped working on multiple servers

book

Article ID: 100312

calendar_today

Updated On:

Products

SystemEDGE Agent CA eHealth

Issue/Introduction

  • SysEDGE on multiple servers stopped working suddenly, and are not responding to SNMP requests.
  • Restarting SystemEDGE the agent hangs and is not initializing properly. 

Environment

  • RedHat 6
  • RedHat 7

Cause

  • SystemEDGE agents run independently.
  • If multiple SystemEDGE agents are impacted simultaneously the problem is cause by an environmental issue.
  • After restarting the SystemEDGE agent the last entry in the sysedge.log is:
add_monitor_entry(): Self monitor index 1000005 configured with variable OID 1.3.6.1.4.1.546.12.1.1.4.1 which is not yet available.
  • OID 1.3.6.1.4.1.546.12.1.1.4.1 
diskStatsUtilization -> diskStatsEntry.4 (1.3.6.1.4.1.546.12.1.1.4) 
Type: INTEGER Access: read-only 
The utilization rate (percentage utilization) for this disk over the last measurement period. This could also be expressed as (disk busy time / elapsed time) * 100. 
  • Disabling SystemEDGE Disk Based polling within the sysedge.cf file allows SystemEDGE to fully initialize:
no_probe_disks 
no_discover_disks      
no_stat_nfs_filesystems
 
  • Running df -h command also hangs.

Resolution

  • A non responsive File System\Disk Array Issue is causing SytemEDGE to become unresponsive.
  • SystemEDGE is a single threaded application and something like a hung\stale NFS mount can cause a "blocking call".

Additional Information

How to verify SysEDGE is working correctly
https://comm.support.ca.com/kb/how-to-verify-sysedge-is-working-correctly/kb000032112