When nsx-config container is OOMKilled, heap dump is not getting created.
SSP 5.0
Some of the reasons for heap dump not getting created are
nsx-config
pod reaches memory limit before java process running inside it reaching the -Xmx
limit and kubernetes kills container leaving heap dump not created.jvmOptions
are defined for heap dump creation in OutOfMemory scenarios, if the container is terminated by Kubernetes too quickly after the OOM event, the heap dump process may not be triggered.-Xmx
of containers in pods nsx-config-0-0
and nsx-config-1-0
are 3750M
and 1500M
respectively, we need 5.7GB
and 2.3GB
respectively. heap dump size would be 1-1.5x of the defined -Xmx
.In scenarios where heap dump is not created automatically, below steps need to be followed to create heap dump manually to collect in support bundle. These steps need to be executed in SSPI VM.
nsx-config
pods.k -n nsxi-platform get pods | grep nsx-config
nsx-config-0-0 2/2 Running 0 128m
nsx-config-1-0 2/2 Running 0 127m
nsx-config-create-kafka-topic-6x729 0/1 Completed 0 3h50m
nsx-config-postgres-ids-signature-purge-cronjob-28992960-p972d 0/1 Completed 0 164m
nsx-config
pods.k -n nsxi-platform get pods | grep nsx-config
nsx-config-0-0 1/2 CrashLoopBackOff 12 (14s ago) 6h27m
nsx-config-1-0 2/2 Running 0 6h21m
nsx-config-create-kafka-topic-j6lmg. 0/1 Completed 0 6h27m
k -n nsxi-platform describe pod nsx-config-0-0
Ports: 8080/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Thu, 21 Nov 2024 17:28:42 +0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Thu, 21 Nov 2024 17:26:50 +0000
Finished: Thu, 21 Nov 2024 17:28:25 +0000
Ready: True
Restart Count: 2
Limits:
nsx-config-0-0
pod is having memory issue, execute the below steps below to collect heap dump inside nsx-config
container of pod.nsx-config
container of nsx-config-0-0
pod.> get the PID of the java process running inside the container.root@sspi:~# k exec -it -n nsxi-platform nsx-config-0-0 -c nsx-config -- /bin/bash
groups: cannot find name for group ID 2000
nsx-user@nsx-config-0-0:/
> Create the heap dump manually by executing below command. Ensure the same path "nsx-user@nsx-config-0-0:/$ pgrep -f nsx-config
7
/var/dump/pace
" is used to collect the heap dump in support bundle.jmap -dump:live,file=/var/dump/pace/<heap_dump_file_name> <PID>
heap_dump_file_name
is the file name with which heap dump is created. Below is the example,nsx-user@nsx-config-0-0:/$ jmap -dump:live,file=/var/dump/pace/nsx-config-0.hprof 7
Dumping heap to /var/dump/pace/nsx-config-0.hprof ...
Heap dump file created [129548512 bytes in 0.868 secs]
> API to check the status of support bundle. check the status and wait till status is success("status": "success").curl -sku admin:<PASSWORD> -X POST 'https://<SSP FQDN>:443/ssp/cluster-api/support-bundle/collection?action=collect' -H "Content-Type: application/json" -d '{"dynamic_content_filters":["DATA_STORAGE","CONFIGURATION_DATABASE","METRICS","ANALYTICS","MESSAGING","PLATFORM_SERVICES"],"log_age_limit":7,"content_filters":["ALL"]}'
Response:
{
"failed_nodes": null,
"remaining_nodes": [
{
"node_display_name": "ssp",
"node_id": "ed5af649-2ba2-484b-8d5e-284fa7a79559",
"status": "running"
}
],
"remoteTaskID": "937",
"request_properties": {},
"status": "running",
"success_nodes": null
}
> API to download support bundle.curl -sku admin:<PASSWORD> -X GET 'https://<SSP FQDN>:443/ssp/cluster-api/support-bundle/status/<remoteTaskID from collect API response>'
Response:
{
"details": {
"failed_nodes": [],
"remaining_nodes": [
{
"node_display_name": "ssp",
"node_id": "ed5af649-2ba2-484b-8d5e-284fa7a79559",
"status": "running"
}
],
"success_nodes": []
},
"request_properties": {
"content_filters": [
"ALL"
],
"dynamic_content_filters": [
"DATA_STORAGE",
"CONFIGURATION_DATABASE",
"METRICS",
"ANALYTICS",
"MESSAGING",
"PLATFORM_SERVICES"
],
"log_age_limit": 7
},
"status": "running" <<status field>
}
curl -sku admin:<PASSWORD> -X GET 'https://<SSP FQDN>:443/ssp/cluster-api/support-bundle/response/<remoteTaskID
from collect API response>' --output ssp.tar.gz
> Verify, it is downloaded.[root@lvn-dvm-10-70-180-45 ~]# ls -lrt ssp*
-rw-r--r-- 1 root root 1493202767 Feb 14 18:05 ssp.tar.gz
Note:
kubectl -n nsxi-platform edit sts nsx-config-0
For nsx-config-0
stateful set
resources:
limits:
memory: 6000Mi
kubectl exec -it -n nsxi-platform nsx-config-0-0 -c nsx-config -- /bin/bash
cd /var/dump/pace