How to monitor BOSH NATS traffic with the NATS CLI
search cancel

How to monitor BOSH NATS traffic with the NATS CLI

book

Article ID: 293856

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

This article covers how to monitor BOSH NATS traffic with the NATS CLI. 

The NATS component on BOSH Director is being used as message bus for communication between BOSH Director and BOSH Agents.

With NATS:
  • BOSH Director can send instructions to BOSH Agents and receive response. 
  • BOSH Agents can report status back to Health Monitor on the BOSH Director.

When troubleshooting NATS traffic between the BOSH Director and BOSH Agents, we can capture the traffic or even send messages with NATS CLI. 


Environment

Product Version: 2.10

Resolution

To monitor BOSH NATS traffic with the NATS CLI, follow these steps: 

1. Download NATS CLI. Check the GitHub release and platform. In this example, download v0.0.26 for the Linux amd64 platform. 

wget https://github.com/nats-io/natscli/releases/download/0.0.26/nats-0.0.26-linux-amd64.zip

Unzip the file and put the `nats` binary in any jumpbox that can access BOSH Director VM at port 4222. Add the binary directory to $PATH and set it as executable with, chmod +x nats.

2. NATS on BOSH Director involves client certificate for authorization and permissions. Check the details at /var/vcap/jobs/nats/config/nats.cfg on the BOSH Director VM.

The following output is the authorization part of nats.cfg. For different client certificates, they are only permitted to publish or subscribe different messages. 
authorization {
  DIRECTOR_PERMISSIONS: {
    publish: [
      "agent.*",
      "hm.director.alert"
    ]
    subscribe: ["director.>"]
  }

  AGENT_PERMISSIONS: {
    publish: [
      "hm.agent.heartbeat._CLIENT_ID",
      "hm.agent.alert._CLIENT_ID",
      "hm.agent.shutdown._CLIENT_ID",
      "director.*._CLIENT_ID.*"
    ]
    subscribe: ["agent._CLIENT_ID"]
  }

  HM_PERMISSIONS: {
    publish: []
    subscribe: [
      "hm.agent.heartbeat.*",
      "hm.agent.alert.*",
      "hm.agent.shutdown.*",
      "hm.director.alert"
    ]
  }

  certificate_clients: [
    {client_name: director.bosh-internal, permissions: $DIRECTOR_PERMISSIONS},
    {client_name: agent.bosh-internal, permissions: $AGENT_PERMISSIONS},
    {client_name: hm.bosh-internal, permissions: $HM_PERMISSIONS},
  ]

  timeout: 30
}


Capture messages that BOSH Director can receive with subscription 'director.>'. For example, when executing `bosh -d <DEPLOYMENT> is`, the BOSH Director receives details from each agent.

 

Prepare files of --tlscert, --tlskey, --tlsca with the 'from' command. 

$ nats --server <director IP>:4222 \
  --tlscert=<FILE from /var/vcap/jobs/director/config/nats_client_certificate.pem> \
  --tlskey=<FILE from /var/vcap/jobs/director/config/nats_client_private_key> \
  --tlsca=<FILE from /var/vcap/jobs/director/config/nats_client_ca_certificate.pem> \
  sub 'director.>'

[#1] Received on "director.b084703a-3685-4096-9aee-5078d2da2dcd.d98ef019-9534-48e2-895b-cc35816bdac1.3a5036df-2509-480c-9fcf-4fba634a00e5"
{"value":{"properties":{"logging":{"max_log_file_size":""}},"job":{"name":"ha_proxy","release":"","template":"bpm","version":"891ed932b8b52a7306b176655967a64b92d30635","templates":[{"name":"bpm","version":"891ed932b8b52a7306b176655967a64b92d30635"},{"name":"haproxy"
...


3. Capture messages that BOSH Health Monitor can receive with subscription 'director.>'. The Health Monitor client certificate can subscribe to the following types of messages according to authorization of NATS configuration. 

# "hm.agent.heartbeat.*",
# "hm.agent.alert.*",
# "hm.agent.shutdown.*",
# "hm.director.alert"


This command subscribes to all heartbeat message from BOSH Agents. 

$ nats --server <director IP>:4222 \
  --tlscert=<FILE from /var/vcap/jobs/health_monitor/config/nats_client_certificate.pem> \
  --tlskey=<FILE from /var/vcap/jobs/health_monitor/config/nats_client_private_key> \
  --tlsca=<FILE from /var/vcap/jobs/health_monitor/config/nats_server_ca.pem> \
  sub 'hm.agent.heartbeat.*'
12:56:36 Subscribing on hm.agent.heartbeat.*

[#1] Received on "hm.agent.heartbeat.367422e1-7204-4145-ba70-7c8480b54f50"
{"deployment":"cf-cd7cc3cd4db8a9288f57","job":"nfs_server","index":0,"job_state":"running","vitals":{"cpu":{"sys":"0.4","user":"0.9","wait":"0.0"},"disk":{"ephemeral":{"inode_percent":"0","percent":"11"},"persistent":{"inode_percent":"10","percent":"75"},"system":{"inode_percent":"33","percent":"47"}},"load":["0.00","0.00","0.00"],"mem":{"kb":"513568","percent":"13"},"swap":{"kb":"153344","percent":"4"},"uptime":{"secs":11391077}},"node_id":"e47828dc-b910-4b77-a01c-9ee5f1ee963c"}
...


4. Capture messages that BOSH Agent receives with the subscription 'agent._CLIENT_ID'.

For example, if we monitor what messages bosh-agent on a diego_cell VM can receive. Be aware the `jq` utility is not available on all VMs. 

$ bosh -d cf-cd7cc3cd4db8a9288f57 ssh diego_cell/0240acb4-95f2-4291-9003-2616bb88b4ac
Using environment '<director IP>' as user 'director'
Using deployment 'cf-cd7cc3cd4db8a9288f57'
...
$ sudo -i
# cat /var/vcap/bosh/settings.json | jq -r .agent_id
fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8
# cat /var/vcap/bosh/settings.json  | jq   .env.bosh.mbus.cert.ca | xargs printf  > /tmp//nats_ca.pem
# cat /var/vcap/bosh/settings.json  | jq   .env.bosh.mbus.cert.certificate | xargs printf  > /tmp//nats_client.pem
# cat /var/vcap/bosh/settings.json  | jq   .env.bosh.mbus.cert.private_key | xargs printf  > /tmp//nats_client.key


Copy the client certificate, key, and CA files into your jumpbox, and subscribe to 'agent.<AGENT_ID>'.

$ nats --server <director IP>:4222 \
  --tlscert=<FILE from diego_cell /tmp/nats_client.pem> \
  --tlskey=<FILE from diego_cell /tmp//nats_client.key> \
  --tlsca=<<FILE from diego_cell /tmp//nats_ca.pem> \
  sub 'agent.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8'

22:47:15 Subscribing on agent.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8

(# triggered by bosh vms)
[#1] Received on "agent.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8"
{"protocol":3,"method":"get_state","arguments":["full"],"reply_to":"director.603ed0d3-9052-4d67-b17a-e6e83b1c5c82.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8.03f9b923-9b6b-4e7b-846e-955ed130c43f"}
...
(# triggered by bosh ssh)
[#5] Received on "agent.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8"
{"protocol":3,"method":"ssh","arguments":["setup",{"public_key":"ssh-rsa *****\n","user":"bosh_dbcc937caee0496"}],"reply_to":"director.468b2537-52b3-4410-8f09-f63255f3dede.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8.34ed2e53-88b6-43e3-9445-1712c5e0d245"}
[#6] Received on "agent.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8"
{"protocol":3,"method":"ssh","arguments":["cleanup",{"user_regex":"^bosh_dbcc937caee0496"}],"reply_to":"director.d061c07b-f092-494a-b0b6-8e7b5007264c.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8.09c93322-f072-49b5-a237-e5a190a8fbb7"}
...
(# triggered by bosh logs)
{"protocol":3,"method":"fetch_logs_with_signed_url","arguments":[{"signed_url":"https://<IP address>:25250/signed/48/e228159d-c185-43e4-b67c-363d5f50b075?e=86400&st=Q39rHu4YXJRBogtE3o9LV1fh3d9HAWm26cC_YBUJPEk&ts=1636498618","log_type":"job","filters":[]}],"reply_to":"director.d061c07b-f092-494a-b0b6-8e7b5007264c.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8.eab6f8e6-8cd4-4e2a-9600-825ef6a17351"}