Justification for collection of the OS Logs for the CARE (Remote Engineer) packages
DX NetOps Performance Management releases
With CARE (Remote Engineer), we collect the following information:
The product does not read the messages logs, the re.sh gathers them as a troubleshooting resources they are invaluable and have proved such over and over.
These logs are where Support first find evidence of the kernel OOM killer being invoked, a place where we find evidence of management utilities like Boks locking things down where they shouldn't, where we have found evidence of MTU misalignment, where we have found signs that SELinux is blocking application capabilities, and where we have found evidence related time skew issues
To summarize, /var/log/messages has many error messages that help us diagnosis why the app may get killed by the OS, or why time shifted, or other possible error messages that help us analyze the environmental issues.
Hence, the collection of OS logs is justified to isolate "Environmental" vs. "Application" defects, specifically for time synchronization and resource availability (CPU/Memory) which are critical for the time-series databases used in NetOps Performance Management.