How to confirm your nats servers are running NATs v2
search cancel

How to confirm your nats servers are running NATs v2

book

Article ID: 298471

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

In TAS v2.11.26, nats-release upgraded its underlying software package from NATS 1.0 (package name gnatsd) to NATS 2.0 (package name nats-server). The release also contains NATS 1.0, which will start as a fallback in case the migration to NATS 2.0 fails.

Link: NATS 2.0 Release Notes

We will be removing NATS 1.0 in a future release. To make sure that your environment is ready for this change, use the following commands to confirm that your nats instances are running NATS v2.


Environment

Product Version: Other
OS: NATS

Resolution

  1. Determine which instance group is running nats by running bosh is -p | grep nats. Looks for results including nats-wrapper and nats-tls-wrapper. On a small footprint environment, nats jobs are co-located on the database instance group; in other environments, they are in a standalone nats instance group.
$ bosh is -p | grep nats
database/4af27472-4215-40c0-adda-67b1600421c3   nats-tls-healthcheck                    running -               -          -
database/4af27472-4215-40c0-adda-67b1600421c3   nats-tls-wrapper                        running -               -          -
database/4af27472-4215-40c0-adda-67b1600421c3   nats-wrapper                            running -               -          -
 
  1. SSH onto a VM that is running nats-wrapper or nats-tls-wrapper and then sudo to root:
$ bosh ssh database/0
$ sudo su
 
  1. Search running processes for nats-server. You should see results, potentially for both nats (non-tls) and nats-tls jobs:
database/4af27472-4215-40c0-adda-67b1600421c3:/var/vcap/bosh_ssh/bosh_c734c56fc5034e3# ps aux | grep nats-server
vcap       10248  0.0  0.2 1242376 16340 ?       S<l  Feb16   0:22 /var/vcap/packages/nats-server/bin/nats-server -c /var/vcap/jobs/nats/config/nats.conf
vcap       10295  0.1  0.2 1242888 18212 ?       S<l  Feb16   0:44 /var/vcap/packages/nats-server/bin/nats-server -c /var/vcap/jobs/nats-tls/config/nats-tls.conf
root       60151  0.0  0.0   6956  2392 pts/0    S+   00:08   0:00 grep --color=auto nats-server
 
  1. Search running processes for gnatsd. You should not see any running processes. This confirms that your environment is running NATS 2.0.
database/4af27472-4215-40c0-adda-67b1600421c3:/var/vcap/bosh_ssh/bosh_c734c56fc5034e3# ps aux | grep gnatsd
root       60161  0.0  0.0   6956  2392 pts/0    S+   00:08   0:00 grep --color=auto gnatsd
 
  1. If gnatsd is running, check for errors in the nats migration logs (nats-wrapper or nats-tls-wrapper):
database/4af27472-4215-40c0-adda-67b1600421c3:/var/vcap/jobs/nats-tls# cd /var/vcap/sys/log/nats-tls/
database/4af27472-4215-40c0-adda-67b1600421c3:/var/vcap/sys/log/nats-tls# ls
bpm.log  healthcheck.stderr.log  healthcheck.stdout.log  nats-tls-wrapper.stderr.log  nats-tls-wrapper.stdout.log  post-start.stderr.log  post-start.stdout.log
database/4af27472-4215-40c0-adda-67b1600421c3:/var/vcap/sys/log/nats-tls# tail nats-tls-wrapper.std*

You should see output related to the nats migration job. Here are the example logs from the migration:

database/4af27472-4215-40c0-adda-67b1600421c3:/var/vcap/sys/log/nats-tls# tail nats-tls-wrapper.std*
==> nats-tls-wrapper.stderr.log <==
==> nats-tls-wrapper.stdout.log <==
{"timestamp":"2023-02-16T16:24:35.724352363Z","level":"info","source":"nats-migrate-server","message":"nats-migrate-server.single-instance-nats-cluster.starting-as-v2","data":{}}
{"timestamp":"2023-02-16T16:24:35.725485775Z","level":"info","source":"nats-migrate-server","message":"nats-migrate-server.started-nats","data":{}}
{"timestamp":"2023-02-16T16:24:35.732565048Z","level":"info","source":"nats-migrate-server","message":"nats-migrate-server.started","data":{}}

Once you've identified the error from the logs and begin fixing, it is safe to bosh restart the instance group that runs nats, which will re-attempt the migration. Note: Restarting individual nats jobs is not sufficient to migrate the cluster and will keep the instances on v1.