The JVM_monitor probe crashes due to native memory allocation failure. The following errors can be seen in the hot spot log files (hs_errr.log files)
Native memory allocation (malloc) failed to allocate 32744 bytes for ChunkPool::allocate
When the JVM_monitor probe runs on a system with many monitored JVMs, it can exhaust **native memory** and crash with a HotSpot fatal error. The process may then hang and require a manual kill, and `jvm_monitor.cfg` can be corrupted after the crash.
This document describes the analysis and recommended mitigations.
UIM 23.4.x
jvm_probe any version
The probe crashed due to an OOM state:
- jvm_monitor.cfg corruption
A crash during a write (e.g. config save or discovery update) can leave the file truncated or inconsistent.
- Process stuck after crash
After a fatal native OOM, the JVM may not shut down cleanly (e.g. safepoint or error reporting stuck). The OS then leaves the process in a zombie/unresponsive state until it is killed.
General recommendations for better JVM_monitor probe performance/memory management:
1. Reduce the number of monitored JVMs
- Remove or disable unused JVM profiles.
- Avoid configuring thousands of profiles and keep only the necessary ones.
2. Lower Java heap size
- Leave more room for native memory. For example, on a 20 GB host, consider **-Xmx1536m** or **-Xmx2048m** instead of 4 GB.
- Set via probe startup options (e.g. `java_mem_max` / `java_mem_init` in probe config or robot).
3. Reduce per-thread stack size
- Add JVM option: **-Xss256k** (or **-Xss512k** if 256k is too low).
- This reduces native memory per thread and can delay native OOM when thread count is high (does not fix the leak).
4. Increase swap and paging file
- Ensure the system has enough swap/paging file so that the probe does not exhaust RAM+swap during growth.
- Prefer fixing the leak; swap only mitigates.
5. Back up jvm_monitor.cfg regularly
- So that after a crash, you can restore the config if the file is corrupted.
6. Monitor thread count
- If the OS or a monitoring tool can report the thread count of the jvm_monitor process, alert on a rising trend (e.g. thousands of threads) to act before OOM.
Example JVM options (startup)
Use these in the probe’s Java startup (e.g. probe options), **in addition to** reducing the number of monitored JVMs and fixing connection reuse:
```text
-Xmx2048m
-Xms512m
-Xss256k
```
Optional, to help with diagnostics in future crashes:
```text
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=<path-to-writable-dir>
Mitigations for .cfg file corruption:
- regular backups of `jvm_monitor.cfg`, and safe write (write to temp + rename) where supported.