Multiple nsx processes are killed due to memory overconsumption.

search cancel

Multiple nsx processes are killed due to memory overconsumption.

book

Article ID: 314223

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

Multiple nsx processes are killed due to out of memory and it can cause the load balancers to show an UNKNOWN status.
You see messages similar to the following in the syslog:

2024-02-27T03:59:33.416Z nsx01.example.com kernel - - - [18553739.400089] Out of memory: Killed process 4077 (java) total-vm:9093096kB, anon-rss:1766052kB, file-rss:0kB, shmem-rss:0kB, UID:154 pgtables:4296kB oom_score_adj:400

2024-02-27T03:59:37.822Z nsx01.example.com kernel - - - [18553744.458338] Out of memory: Killed process 1125 (nsx-platform-cl) total-vm:25822740kB, anon-rss:13371484kB, file-rss:0kB, shmem-rss:0kB, UID:131 pgtables:26420kB oom_score_adj:0
You see that the nsx-platform-client service is using an excessive amount of memory

Tue Feb 27 03:59:02 UTC 2024
top - 03:59:03 up 214 days, 17:59, 0 users, load average: 4.24, 3.11, 2.88
Tasks: 330 total, 3 running, 327 sleeping, 0 stopped, 0 zombie
%Cpu(s): 26.1 us, 12.2 sy, 0.0 ni, 59.6 id, 0.4 wa, 0.0 hi, 1.7 si, 0.0 st
KiB Mem : 49302364 total, 333976 free, 43536576 used, 5431812 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 5065864 avail Mem
<SNIP>
1125 nsxplat+ 20 0 4851220 3.6g 4732 S 35.3 7.7 21:35.13 1125 /opt/vmware/nsx-platform-client/bin/nsx-platform-client
<SNIP>
4077 uphc 20 0 9093096 1.7g 9020 S 5.9 3.6 22586:49 4077 /usr/lib/jvm/openjdk-java8-runtime-amd64/bin/java -Djava.util.logging.config.file=/opt/vmware/phone+

Environment

VMware NSX-T Data Center
VMware NSX

Cause

When a specific manager node requests to get a support bundle from other manager nodes, it can trigger the nsx-platform-client memory spike if the bundle is large

Resolution

This issue is resolved in VMware NSX 4.2.0

Workaround:

Reboot the problematic NSX manager node. Collect support bundles directly from each NSX manager node.

Additional Information

Impact/Risks:

There might be various type of symptoms and in this case, the status of load balancer becomes "UNKNOWN" and the specific nsx manager node is malfunctioning.

Feedback

thumb_up Yes

thumb_down No