When troubleshooting NATS traffic between the BOSH Director and BOSH Agents, we can capture the traffic or even send messages with NATS CLI.
To monitor BOSH NATS traffic with the NATS CLI, follow these steps:
1. Download NATS CLI. Check the GitHub release and platform. In this example, download v0.0.26 for the Linux amd64 platform.
wget https://github.com/nats-io/natscli/releases/download/0.0.26/nats-0.0.26-linux-amd64.zip
authorization { DIRECTOR_PERMISSIONS: { publish: [ "agent.*", "hm.director.alert" ] subscribe: ["director.>"] } AGENT_PERMISSIONS: { publish: [ "hm.agent.heartbeat._CLIENT_ID", "hm.agent.alert._CLIENT_ID", "hm.agent.shutdown._CLIENT_ID", "director.*._CLIENT_ID.*" ] subscribe: ["agent._CLIENT_ID"] } HM_PERMISSIONS: { publish: [] subscribe: [ "hm.agent.heartbeat.*", "hm.agent.alert.*", "hm.agent.shutdown.*", "hm.director.alert" ] } certificate_clients: [ {client_name: director.bosh-internal, permissions: $DIRECTOR_PERMISSIONS}, {client_name: agent.bosh-internal, permissions: $AGENT_PERMISSIONS}, {client_name: hm.bosh-internal, permissions: $HM_PERMISSIONS}, ] timeout: 30 }
Capture messages that BOSH Director can receive with subscription 'director.>'. For example, when executing `bosh -d <DEPLOYMENT> is`, the BOSH Director receives details from each agent.
Prepare files of --tlscert, --tlskey, --tlsca with the 'from' command.
$ nats --server <director IP>:4222 \ --tlscert=<FILE from /var/vcap/jobs/director/config/nats_client_certificate.pem> \ --tlskey=<FILE from /var/vcap/jobs/director/config/nats_client_private_key> \ --tlsca=<FILE from /var/vcap/jobs/director/config/nats_client_ca_certificate.pem> \ sub 'director.>' [#1] Received on "director.b084703a-3685-4096-9aee-5078d2da2dcd.d98ef019-9534-48e2-895b-cc35816bdac1.3a5036df-2509-480c-9fcf-4fba634a00e5" {"value":{"properties":{"logging":{"max_log_file_size":""}},"job":{"name":"ha_proxy","release":"","template":"bpm","version":"891ed932b8b52a7306b176655967a64b92d30635","templates":[{"name":"bpm","version":"891ed932b8b52a7306b176655967a64b92d30635"},{"name":"haproxy" ...
3. Capture messages that BOSH Health Monitor can receive with subscription 'director.>'. The Health Monitor client certificate can subscribe to the following types of messages according to authorization of NATS configuration.
# "hm.agent.heartbeat.*", # "hm.agent.alert.*", # "hm.agent.shutdown.*", # "hm.director.alert"
This command subscribes to all heartbeat message from BOSH Agents.
$ nats --server <director IP>:4222 \ --tlscert=<FILE from /var/vcap/jobs/health_monitor/config/nats_client_certificate.pem> \ --tlskey=<FILE from /var/vcap/jobs/health_monitor/config/nats_client_private_key> \ --tlsca=<FILE from /var/vcap/jobs/health_monitor/config/nats_server_ca.pem> \ sub 'hm.agent.heartbeat.*' 12:56:36 Subscribing on hm.agent.heartbeat.* [#1] Received on "hm.agent.heartbeat.367422e1-7204-4145-ba70-7c8480b54f50" {"deployment":"cf-cd7cc3cd4db8a9288f57","job":"nfs_server","index":0,"job_state":"running","vitals":{"cpu":{"sys":"0.4","user":"0.9","wait":"0.0"},"disk":{"ephemeral":{"inode_percent":"0","percent":"11"},"persistent":{"inode_percent":"10","percent":"75"},"system":{"inode_percent":"33","percent":"47"}},"load":["0.00","0.00","0.00"],"mem":{"kb":"513568","percent":"13"},"swap":{"kb":"153344","percent":"4"},"uptime":{"secs":11391077}},"node_id":"e47828dc-b910-4b77-a01c-9ee5f1ee963c"} ...
4. Capture messages that BOSH Agent receives with the subscription 'agent._CLIENT_ID'.
For example, if we monitor what messages bosh-agent on a diego_cell VM can receive. Be aware the `jq` utility is not available on all VMs.
$ bosh -d cf-cd7cc3cd4db8a9288f57 ssh diego_cell/0240acb4-95f2-4291-9003-2616bb88b4ac Using environment '<director IP>' as user 'director' Using deployment 'cf-cd7cc3cd4db8a9288f57' ... $ sudo -i # cat /var/vcap/bosh/settings.json | jq -r .agent_id fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8 # cat /var/vcap/bosh/settings.json | jq .env.bosh.mbus.cert.ca | xargs printf > /tmp//nats_ca.pem # cat /var/vcap/bosh/settings.json | jq .env.bosh.mbus.cert.certificate | xargs printf > /tmp//nats_client.pem # cat /var/vcap/bosh/settings.json | jq .env.bosh.mbus.cert.private_key | xargs printf > /tmp//nats_client.key
Copy the client certificate, key, and CA files into your jumpbox, and subscribe to 'agent.<AGENT_ID>'.
$ nats --server <director IP>:4222 \ --tlscert=<FILE from diego_cell /tmp/nats_client.pem> \ --tlskey=<FILE from diego_cell /tmp//nats_client.key> \ --tlsca=<<FILE from diego_cell /tmp//nats_ca.pem> \ sub 'agent.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8' 22:47:15 Subscribing on agent.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8 (# triggered by bosh vms) [#1] Received on "agent.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8" {"protocol":3,"method":"get_state","arguments":["full"],"reply_to":"director.603ed0d3-9052-4d67-b17a-e6e83b1c5c82.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8.03f9b923-9b6b-4e7b-846e-955ed130c43f"} ... (# triggered by bosh ssh) [#5] Received on "agent.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8" {"protocol":3,"method":"ssh","arguments":["setup",{"public_key":"ssh-rsa *****\n","user":"bosh_dbcc937caee0496"}],"reply_to":"director.468b2537-52b3-4410-8f09-f63255f3dede.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8.34ed2e53-88b6-43e3-9445-1712c5e0d245"} [#6] Received on "agent.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8" {"protocol":3,"method":"ssh","arguments":["cleanup",{"user_regex":"^bosh_dbcc937caee0496"}],"reply_to":"director.d061c07b-f092-494a-b0b6-8e7b5007264c.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8.09c93322-f072-49b5-a237-e5a190a8fbb7"} ... (# triggered by bosh logs) {"protocol":3,"method":"fetch_logs_with_signed_url","arguments":[{"signed_url":"https://<IP address>:25250/signed/48/e228159d-c185-43e4-b67c-363d5f50b075?e=86400&st=Q39rHu4YXJRBogtE3o9LV1fh3d9HAWm26cC_YBUJPEk&ts=1636498618","log_type":"job","filters":[]}],"reply_to":"director.d061c07b-f092-494a-b0b6-8e7b5007264c.fe0dd165-a3f2-4cf0-a925-ffbc9e27dfa8.eab6f8e6-8cd4-4e2a-9600-825ef6a17351"}