Log Analytics, logs-opensearch-master, fails to start and shows a CrashLoopBackOff error (exit code 78)

Products

DX Operational Intelligence DX Application Performance Management

Issue/Introduction

Log Analytics fails to start as the pods keep crashing, showing a CrashLoopBackOff

A describe of the pod (kubectl describe pod <pod> -n dxi) shows that the the container is continuously restarted until a back-off occurs:

Containers:
logs-opensearch-master:
Container ID: containerd://[REDACTED]
Image: dxiregistry.[REDACTED]:5000/dxi/doi-loganalytics-opensearch:24.4.1.1
Image ID: dxiregistry.[REDACTED]:5000/dxi/doi-loganalytics-opensearch@sha[REDACTED]
Ports: 9200/TCP, 9300/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 78
Started: Mon, 30 Sep 2024 16:45:07 +0200
Finished: Mon, 30 Sep 2024 16:45:20 +0200

...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 16m default-scheduler Successfully assigned dxi/logs-opensearch-master-0 to [REDACTED]
Normal Pulled 14m (x5 over 16m) kubelet Container image "dxiregistry.[REDACTED]:5000/dxi/doi-loganalytics-opensearch:24.4.1.1" already present on machine
Normal Created 14m (x5 over 16m) kubelet Created container logs-opensearch-master
Normal Started 14m (x5 over 16m) kubelet Started container logs-opensearch-master
Warning BackOff 69s (x63 over 15m) kubelet Back-off restarting failed container logs-opensearch-master in pod logs-opensearch-master-0_dxi([REDACTED])

Environment

DX Platform 24.1 onPrem

Nodes on RHEL 8.10

Cause

Container logs (kubectl logs "pod-name" -c "container-name" -dxi), showed that a bootstrap check fails so the pod can not be started:

[2024-10-01T06:10:00,403][INFO ][o.o.n.Node ] [logs-opensearch-master-0] initialized
[2024-10-01T06:10:00,404][INFO ][o.o.n.Node ] [logs-opensearch-master-0] starting ...
[2024-10-01T06:10:00,522][INFO ][o.o.t.TransportService ] [logs-opensearch-master-0] publish_address {192.168.xxx.xxx:9300}, bound_addresses {0.0.0.0:9300}
[2024-10-01T06:10:00,525][INFO ][o.o.t.TransportService ] [logs-opensearch-master-0] Remote clusters initialized successfully.
[2024-10-01T06:10:01,023][INFO ][o.o.b.BootstrapChecks ] [logs-opensearch-master-0] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
ERROR: OpenSearch did not exit normally - check the logs at /opt/opensearch/logs/logs-opensearch-master-0/loganalytics1_es.log
[2024-10-01T06:10:01,038][INFO ][o.o.n.Node ] [logs-opensearch-master-0] stopping ...
[2024-10-01T06:10:01,039][INFO ][o.o.s.a.r.AuditMessageRouter] [logs-opensearch-master-0] Closing AuditMessageRouter
[2024-10-01T06:10:01,039][INFO ][o.o.s.a.s.SinkProvider ] [logs-opensearch-master-0] Closing DebugSink
[2024-10-01T06:10:01,052][INFO ][o.o.n.Node ] [logs-opensearch-master-0] stopped
[2024-10-01T06:10:01,053][INFO ][o.o.n.Node ] [logs-opensearch-master-0] closing ...
[2024-10-01T06:10:01,062][INFO ][o.o.s.a.i.AuditLogImpl ] [logs-opensearch-master-0] Closing AuditLogImpl
[2024-10-01T06:10:01,075][INFO ][o.o.n.Node ] [logs-opensearch-master-0] closed

Resolution

The value of the max number of virtual memory areas a process may have (vm.max_map_count) has to be increased by following these steps:

- Check the current configuration in the node:

cat /proc/sys/vm/max_map_count

- If the map count is not 262144 or higher, update the file /etc/sysctl.conf, adding this:

vm.max_map_count=262144

- Run the following command to apply the changes without restarting the node:

sysctl -q -w vm.max_map_count=262144

Additional Information

It is recommended to repeat the same procedure on all nodes to which the log analytics may be attached.

More details about the max_map_count:

https://access.redhat.com/solutions/99913