Unexpected OOMKilled event on application pod
TCA 3.2
Note: Collect logs immediately after the issue to avoid log rotation.
For Root Cause Analysis (RCA), open a Broadcom Support case and attach the logs listed below.
Command Outputs from the Cluster:
kubectl get nodes -A -o wide
Kubectl get pods -A -o wide
kubectl get pods -A -o wide grep <problematic pod name>
kubectl describe pod <problematic pod name> -n <namespace>
kubectl get pod <problematic pod name> -n <namespace> -o yaml
kubectl get pod <problematic pod name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'
kubectl logs <problematic pod name> -n <namespace> --previous
kubectl logs <problematic pod name> -n <namespace>
kubectl get events -A
kubectl get pods --all-namespaces -o json | jq -r '.items[] | {pod: .metadata.name, namespace: .metadata.namespace, uid: .metadata.uid, containers: .status.containerStatuses[]}'
kubectl top pod -n <problematic pod namespace>
Log into the Node where the Pod was running and collect outputs from Guest OS of the Node:
Inside the TKG node VM where pod was running:
kubectl top node
Now switch to root user:
crictl ps -a
crictl pods
crictl ps -a | grep <problematic pod name>
crictl ps -q
crictl ps -q | xargs -n 1 crictl inspect | grep -E "id|pid"
ctr -n k8s.io containers list
ctr -n k8s.io containers list | grep gwpmm
ps -ef
ps -aux
ps -e -o pid,ppid,user,args
dmesg
dmesg -T
dmesg -T | grep -i oom
dmesg -T | grep -i kill
journalctl
journalctl -u kubelet
journalctl -u containerd
journalctl --since "48 hours ago"
journalctl --no-pager
cat /var/log/messages
free -h
cat /proc/meminfo
ps aux
df -hT
df -i
tar -czvf /tmp/var-log-backup-<Replace-with-actual--name-of-the-Node/VM>-$(date +%Y%m%d).tar.gz /var/log
tar -czvf /tmp/var-log-backup-<Replace-with-actual--name-of-the-Node/VM>-$(date +%Y%m%d).tar.gz /var/log