Multiple nsx processes are killed due to memory overconsumption.
search cancel

Multiple nsx processes are killed due to memory overconsumption.

book

Article ID: 314223

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Multiple nsx processes are killed due to out of memory and it can cause the load balancers to show an UNKNOWN status.
  • You see messages similar to the following in the syslog:

    2024-02-27T03:59:33.416Z nsx01.example.com kernel - - - [18553739.400089] Out of memory: Killed process 4077 (java) total-vm:9093096kB, anon-rss:1766052kB, file-rss:0kB, shmem-rss:0kB, UID:154 pgtables:4296kB oom_score_adj:400

    2024-02-27T03:59:37.822Z nsx01.example.com kernel - - - [18553744.458338] Out of memory: Killed process 1125 (nsx-platform-cl) total-vm:25822740kB, anon-rss:13371484kB, file-rss:0kB, shmem-rss:0kB, UID:131 pgtables:26420kB oom_score_adj:0

  • You see that the nsx-platform-client service is using an excessive amount of memory

    Tue Feb 27 03:59:02 UTC 2024
    top - 03:59:03 up 214 days, 17:59, 0 users, load average: 4.24, 3.11, 2.88
    Tasks: 330 total, 3 running, 327 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 26.1 us, 12.2 sy, 0.0 ni, 59.6 id, 0.4 wa, 0.0 hi, 1.7 si, 0.0 st
    KiB Mem : 49302364 total, 333976 free, 43536576 used, 5431812 buff/cache
    KiB Swap: 0 total, 0 free, 0 used. 5065864 avail Mem
    <SNIP>
       1125 nsxplat+ 20 0 4851220 3.6g 4732 S 35.3 7.7 21:35.13 1125 /opt/vmware/nsx-platform-client/bin/nsx-platform-client
    <SNIP>
       4077 uphc 20 0 9093096 1.7g 9020 S 5.9 3.6 22586:49 4077 /usr/lib/jvm/openjdk-java8-runtime-amd64/bin/java -Djava.util.logging.config.file=/opt/vmware/phone+

Environment

VMware NSX-T Data Center
VMware NSX

Cause

When a specific manager node requests to get a support bundle from other manager nodes, it can trigger the nsx-platform-client memory spike if the bundle is large


Resolution

This issue is resolved in VMware NSX 4.2.0

Workaround:

Reboot the problematic NSX manager node. Collect support bundles directly from each NSX manager node.

 

Additional Information

Impact/Risks:

There might be various type of symptoms and in this case, the status of load balancer becomes "UNKNOWN" and the specific nsx manager node is malfunctioning.