We are running cdm 8.03 and when iostat monitor is enabled (monitor_iostat=yes) the probe crashes / fails to start. When iostat monitor is disabled (monitor_iostat=no) the probe starts normally.
The issue should be fixed in cdm 7.22 and later as documented here: CDM 7.* doesn't start on RHEL9: Controller: Max. restarts reached for probe 'cdm' (command = cdm)
However, it is still failing to start in one server, while working on similar servers with the same OS and probe version.
Analysis of cdm.log: shows the probe crashes after calling syssyat
Mar 4 08:09:17:102 [14###########52] cdm: calling syssyatMar 4 08:09:18:093 [14###########60] Controller: Max. restarts reached for probe 'cdm' (command = cdm)
The log reveals the probe Segmentation Faults or hangs indefinitely when invoking the system's sysstat utility for iostat data.
Potential Root Causes:
- Sysstat Version/Architecture Conflict: Since the workaround is deactivating iostat, the root cause might be an incompatibility between the cdm probe's internal call and the version of sysstat installed on this specific server.
- Library Sub-versioning (Glibc): even though "ldd" may show the libraries are present, a slight mismatch in the glibc version on this server compared to a working one can cause memory allocation failures during the heavy "freeing data" loops seen in the logs.
As the issue is likely to be environmental and having a similar server (same os and same cdm version) where the issue is NOT ocurring follow the below steps to resolve:
1. Align sysstat and Kernel Versions
Review the output of the following commands in the working and non working environment;:
Command: rpm -q sysstat
Check the version of sysstat on the non-working server. If it differs from the working server, align them using the native package manager.
Action: If they differ, update/reinstall: dnf reinstall sysstat
2. Manual ldd Verification on sysstat
Since the probe calls sysstat externally, verify if the system utility itself is stable on this architecture:
Check the version of sysstat on the non-working server. If it differs from the working server, align them using the native package manager.
What is the output of the following command on the non working enviroment?
Command: iostat -xk 1 2
If this command fails or hangs, the issue is at the OS/Kernel level, not the probe.