CDM 8.03 still crashes (Controller: Max. restarts reached) on RHEL 9 with iostat enabled
search cancel

CDM 8.03 still crashes (Controller: Max. restarts reached) on RHEL 9 with iostat enabled

book

Article ID: 434201

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

We are running cdm 8.03 and when iostat monitor is enabled (monitor_iostat=yes) the probe crashes / fails to start. When iostat monitor is disabled (monitor_iostat=no) the probe starts normally. 

The issue should be fixed in cdm 7.22 and later as documented here: CDM 7.* doesn't start on RHEL9: Controller: Max. restarts reached for probe 'cdm' (command = cdm)

However, it is still failing to start in one server, while working on similar servers with the same OS and probe version. 

Analysis of cdm.log: shows the probe crashes after calling syssyat


Mar  4 08:09:17:102 [14###########52] cdm: calling syssyat
Mar  4 08:09:18:093 [14###########60] Controller: Max. restarts reached for probe 'cdm' (command = cdm)

Environment

  • DX UIM 23.4.*
  • cdm 8.*

Cause

The log reveals the probe Segmentation Faults or hangs indefinitely when invoking the system's sysstat utility for iostat data.

Potential Root Causes:

- Sysstat Version/Architecture Conflict: Since the workaround is deactivating iostat, the root cause might be an incompatibility between the cdm probe's internal call and the version of sysstat installed on this specific server.

- Library Sub-versioning (Glibc): even though "ldd" may show the libraries are present, a slight mismatch in the glibc version on this server compared to a working one can cause memory allocation failures during the heavy "freeing data" loops seen in the logs.

Resolution

As the issue is likely to be environmental and having a similar server (same os and same cdm version) where the issue is NOT ocurring follow the below steps to resolve: 

 

1. Align sysstat and Kernel Versions

Review the output of the following commands in the working and non working environment;:

 

Command: rpm -q sysstat

Check the version of sysstat on the non-working server. If it differs from the working server, align them using the native package manager.

 

Action: If they differ, update/reinstall: dnf reinstall sysstat


 

2. Manual ldd Verification on sysstat

Since the probe calls sysstat externally, verify if the system utility itself is stable on this architecture:

Check the version of sysstat on the non-working server. If it differs from the working server, align them using the native package manager.

What is the output of the following command on the non working enviroment? 

 

Command: iostat -xk 1 2

 

If this command fails or hangs, the issue is at the OS/Kernel level, not the probe.