Title: Alarm for nsx-node-agent health status
Event ID: node_agents_health.node_agents_down
Added in release: 3.0.0
Alarm Description
nsx-node-agent container and hyperbus is down. To find the nsx-node-agent Pod name and namespace:kubectl get pods --all-namespaceskubectl command to check the connection status:kubectl exec -it <nsx-node-agent-Pod-Name> -n <nsx-node-agent-Pod-NameSpace> -c nsx-node-agent bashnsxcliget node-agent-hyperbus statusnsx-node-agent container, use the kubectl logs command to check the issue and fix the error:kubectl logs <nsx-node-agent-Pod-Name> -n <nsx-node-agent-Pod-NameSpace> -c nsx-node-agentnsx-node-agent Pod to fix the issue.pks-<UUID of cluster>.service-instance_<UUID of cluster>.bosh vms -d service-instance_<UUID>worker/<vm-id>. Log in to the worker VM:bosh ssh -d service-instance_<UUID> worker/<vm id>nsx-node-agent process status:sudo monit status or sudo monit summarybosh instances -d service-instance_<UUID> -pnsx-node-agent is not running, go to the nsx-node-agent log folder and check the logs:cd /var/vcap/sys/logs/nsx-node-agentnsx-node-agent to fix the issue:sudo monit restart nsx-node-agentbosh vms:cf-<deployment id>.diego_cell/ on which nsx-node-agent is running as a process.diego_cell VM:bosh ssh -d cf-<deployment id> diego_cell/<instance id>nsx-node-agent process status:sudo monit status or sudo monit summarybosh instances -d cf-<deployment id> -pnsx-node-agent is not running, go to the nsx-node-agent log folder and check the logs:cd /var/vcap/sys/logs/nsx-node-agentnsx-node-agent to fix the issue:sudo monit restart nsx-node-agent