When attempting to upgrade Tanzu Platform for Cloud Foundry to a 10.x and above release, the Healthwatch tile fails when attempting to start.
Error: 'grafana/### (0)' is not running after update. Review logs for failed jobs: healthwatch_route_registrar
Looking at the logs shows in the grafana VM shows:
main.main()
/tmp/build/c40940d1/tile-src/releases/grafana-release/src/route-registrar/main.go:121 +0xdb9
panic: nats: no servers available for connectionTPCF 10.x
Healthwatch versions below 2.3.2
This is due to a change in NATS in TPCF 10.x. Starting in this release, NATS is only listening on the TLS port.
If we compare the listening processes on a NATS VM in TPCF 6.x:
nats/####:~# ps aux | grep nats-server
vcap 6159 0.0 2.4 1241352 23832 ? S<l Apr24 9:09 /var/vcap/packages/nats-server/bin/nats-server -c /var/vcap/jobs/nats/config/nats.conf
vcap 6203 0.2 2.9 1241864 28472 ? S<l Apr24 32:15 /var/vcap/packages/nats-server/bin/nats-server -c /var/vcap/jobs/nats-tls/config/nats-tls.conf
root 19329 0.0 0.2 6612 2388 pts/0 S+ 13:05 0:00 grep --color=auto nats-server
And then 10.x
nats/###:~# ps aux | grep nats-server
vcap 6173 0.3 0.7 1242120 28352 ? S<l Apr28 19:49 /var/vcap/packages/nats-server/bin/nats-server -c /var/vcap/jobs/nats-tls/config/nats-tls.conf
root 13656 0.0 0.0 6612 2292 pts/0 S+ 13:05 0:00 grep --color=auto nats-server
4224 is the NATS TLS port. So in 10.x, it only listens there.
In versions of Healthwatch below 2.3.2, it attempts to connect to NATS over the non TLS port. This is why it is unable to find the NATS server.
More information on this can be found here:
https://knowledge.broadcom.com/external/article?articleNumber=298471
To resolve this, you will need to upgrade Healthwatch to at least 2.3.2. This version will connect to NATS over the TLS port correctly.