NATS VM high memory
search cancel

NATS VM high memory

book

Article ID: 298125

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

This Knowledge Base (KB) article details a known memory leak in nats version 2 (v2) service running on the "nats" instance group in Tanzu Application Service for VMs (TAS).

Starting in nats release v46 the underlying package nats-server has been updated from v1 to v2. This change has been introduced in the following TAS versions:

  • TAS v2.11.26+
  • TAS v2.13.14+
  • TAS v3.0+
  • TAS v4.0+
  • TAS v5.0+


For more information on how to confirm your nats service is running on v2 please see this KB article.

The following patterns can be observed when this nats memory leak is present:

BOSH metrics
One node will show higher memory consumption and never releases the memory.

$ bosh -d cf-76cf11c200725fcf4b1e vms --vitals --column={Instance,"Memory Usage"} | grep nats
nats/2e0f7951-9990-418a-8520-bed481b3e10f                               22% (444 MB)
nats/565fb9e7-52de-49f4-958e-a8fa3ba6594e                               36% (728 MB)
nats/869453b5-fe28-4c65-b7c6-510925e3569b                               21% (416 MB)



OS metrics
OS level commands such as htop will show the memory is consumed from the nats-wrapper jobs.



Monit summary
Monit will show the nats-wrapper jobs as failing if the memory exceeds the healthy threshold.

nats/565fb9e7-52de-49f4-958e-a8fa3ba6594e:~$ sudo monit summary
The Monit daemon 5.2.5 uptime: 6d 4h 21m 
 
Process 'nats-wrapper'              Resource limit matched
Process 'nats-tls-wrapper'          Resource limit matched
Process 'nats-tls-healthcheck'      running
Process 'loggregator_agent'         running
Process 'loggr-syslog-agent'        running
Process 'metrics-discovery-registrar' running
Process 'metrics-agent'             running
Process 'loggr-forwarder-agent'     running
Process 'prom_scraper'              running
Process 'bosh-dns'                  running
Process 'bosh-dns-resolvconf'       running
Process 'bosh-dns-healthcheck'      running
Process 'system-metrics-agent'      running



Environment

Product Version: 3.0

Resolution

The memory leak was identified as a clustering problem among the nats nodes. Various transient network issues, such as but not limited to duplicate IP address in the environment matching one of the nats nodes, can lead to a memory leak in that specific nats node. This can affect both nats-wrapper and/or nats-tls-wrapper.
 
The problem has been reported on the GitHub project. As of November, 2023, the OSS development team notes that the root cause is an environmental factor. Since it has been over one year and no enhancements have been proposed (as of December, 2024), one should not expect a fix in nats code.

To workaround this issue please fix the underlying network issues in the environment and monit restart the nats-wrapper service on the impacted nats node.