VMware NSX
VMware NSX-T Data Center
There are 3 Agents whose status is tracked in the NSX UI:
nsx-cfgagent ← Interacts with dataplane modules like VDL2, KCP, and DFW
When stopped, Host appears Down in UI
nsx-opsagent ← Includes nsx-da (inventory discovery agent which communicates with nestdb) and nsxa (deals with host switch related operations)
When stopped, Host appears Down in UI
nsx-nestdb ← Stores desired state from control plane and runtime state info from dataplane
When stopped, controller connectivity will be Down and 'get controllers' returns 'Failed to get controller list'.
Other agents which are not tracked in UI "Agent Status" pane, and what happens when they are stopped:
nsx-proxy ← Interacts with both Policy and CCP on the NSX manager, and nsx-opsagent and nsx-nestdb on host
When stopped, NSX Configuration is listed as "Host Disconnected". Manager connectivity is Down, and Agent Status is not reported.
nsx-sfhc ← This is the installation agent for NSX deployment that communicates with MP
When stopped, UI shows host with status "Install Failed" and View Details is not available.
2. Agent Status in NSX UI
In the 3,2.2+ NSX UI, Host Agent Status for 3 agents is viewed at: System > Fabric > Hosts > View Details on the Host > Monitor > Agent Status
If one Agent (opsagent, cfgagent, nestdb) is stopped, the Host's overall Status will show as Down
3. Commands
To manage agent service from ESX command line: /etc/init.d/<service name> status/stop/start
'esxcli network ip connection list | grep 1234' and 'esxcli network ip connection list | grep 1235' will show connections to the Managers and Controller with World Name of 'nsx-proxy' ← If nsx-proxy is stopped, these connections will not be listed
4. Logs
nsx-proxy logs are in nsx-syslog.log. This will show logging for the the nsx-proxy agent connection to the Manager components.
ag -i "nsx-proxy" var/run/log/nsx-syslog.log
nsx-proxy heartbeats from MP appear like this:
2022-08-11T04:25:56Z nsx-proxy: NSX 2101571 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="mpa-proxy-lib" tid="2101571" level="INFO"] MessagingClientService: Heartbeat message received in FrameworkUnifiedMsg from endpoint: ssl://10.105.8.11:1234 client_id: a5887a11-c352-426c-951b-b2def1ea4806
2022-08-11T04:26:56Z nsx-proxy: NSX 2101571 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="mpa-proxy-lib" tid="2101571" level="INFO"] MessagingClientService: Heartbeat message received in FrameworkUnifiedMsg from endpoint: ssl://10.105.8.11:1234 client_id: a5887a11-c352-426c-951b-b2def1ea4806
To check for gaps in heartbeats from management plane, run:
ag "Heartbeat message" nsx-syslog* | awk '{print $1}' | cut -d ":" -f3- | sort -V | cut -d":" -f1 | uniq -c
There are usually 60 an hour. A high-level look at heartbeat loss looks like this:
60 2022-08-20T02
60 2022-08-20T03
7 2022-08-20T04 ← only 7 heartbeats from MP this hour
32 2022-08-25T00 ← only 32 heartbeats from MP this hour
60 2022-08-25T01
60 2022-08-25T02
nsx-nestdb logs are in nsx-syslog.log:
ag -i "nestdb" var/run/log/nsx-syslog.log
nsx-cfgagent logs are in nsx-syslog.log:
ag -i "cfgagent" var/run/log/nsx-syslog.log
nsx-opsagent logs are in nsx-syslog.log:
ag -i "opsagent" var/run/log/nsx-syslog.log
If nsxda cannot connect with nestdb, logging in nsx-syslog.log appears like this:
2022-12-01T18:39:50.058Z nsx-opsagent[2103501]: NSX 2103501 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxda" tid="2105253" level="WARNING"] Waiting for NestDB to connect.
2022-12-01T18:39:53.931Z nsx-opsagent[2103501]: NSX 2103501 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxda" tid="2105252" level="WARNING"] Waiting for NestDB to connect.