Title: Alarm for Edge agent liveliness down
Event ID: edge_health.edge_agent_down
Alarm Description
Steps to resolve
For release 4.2.0 and higher
Recommended Action:
get service local-controller stateSample output:
Edge1> get service local-controller state Thu Dec 21 2023 UTC 06:59:34.273 Uptime: 320587.483 seconds (since 2023-12-17T13:56:26.81) Full Sync State : Completed at {num: 2, time: 2023-12-18T08:06:39.60} IPC Channel State Datapath Config : Up since 2023-12-17T13:57:50.39 Datapath State : Up since 2023-12-17T13:57:50.35 Routing Service : Up since 2023-12-18T06:15:08.73 BFD Config : NoneBFD State : None
If the CLI succeeds, it might be a transient problem where the CPU load might be high.
If CLI fails (no output), continue to next check.
root ps auxww | grep edge-agentSample output:
root@Edge1:~# ps aux. | grep edge-agent nsxa 2797 0.0 0.4 133039232968 ? Ssl Dec17 3:24 /opt/vmware/nsx-edge/bin/edge-agent --no-chdir --unixctl=/var/run/vmware/edge/nsxa.ctl --pidfile=/var/run/vmware/edge/nsxa.pid -vconsole:err -vsyslog:info --syslog-method=udp:127.0.0.1root 2586883 0.0 0.0 6776 2216 pts/0 S+ 06:58 0:00 grep --color=auto edge-agent.
adminstart service local-controllerSample Output:Edge1> start service local-controllerEdge1>.
roottail -f /var/log/syslog | grep subcomp=‘nsxa’Sample Output:
root@Edge1:~# tail -f /var/log/syslog | grep nsxa2023-12-19T12:14:10.040Z Edge1 NSX 1 FABRIC [nsx@6876 comp=‘nsx-edge’ subcomp=‘nsxa’ s2comp=‘ha-cluster’ level=‘INFO’] HA port b1e57b81-ac62-4ad0-91db-ffafe1a09457 IP 169.254.0.2/24 type 2 2023-12-19T12:14:10.040Z Edge1 NSX 1 FABRIC [nsx@6876 comp=‘nsx-edge’ subcomp=‘nsxa’ s2comp=‘ha-cluster’ level=‘INFO’] HA port b1e57b81-ac62-4ad0-91db-ffafe1a09457 IP 169.254.0.3/24 type 2
restart service local-controllerSample output:
Edge1>Edge1> restart service local-controllerEdge1> .
Maintenance window required for remediation? No