Troubleshooting NSX Host Agents
search cancel

Troubleshooting NSX Host Agents

book

Article ID: 381955

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

  • NSX host agents that run at the user world layer on the ESXi host to facilitate the realization of logical switch ports and the associated configurations/properties by interacting with components on the NSX unified appliance, as well as other components on the ESXi host such as kernel modules, vSphere libraries, and non NSX agents.
  • NSX agents include nsx-proxy, nsx-nestdb, nsx-cfgagent, nsx-opsagent, nsx-sfhc
  • If one or more agents are in "stopped" state, host transport nodes may come up as "Host Disconnected" in NSX UI, or you may observe issues with vMotions or connecting the VM vNICs. 

Environment

VMware NSX
VMware NSX-T Data Center

Resolution

  1. Host Agent Overview:

There are 3 Agents whose status is tracked in the NSX UI:

nsx-cfgagent     ← Interacts with dataplane modules like VDL2, KCP, and DFW

When stopped, Host appears Down in UI

nsx-opsagent    ← Includes nsx-da (inventory discovery agent which communicates with nestdb) and nsxa (deals with host switch related operations) 

When stopped, Host appears Down in UI

nsx-nestdb       ← Stores desired state from control plane and runtime state info from dataplane

When stopped, controller connectivity will be Down and 'get controllers' returns 'Failed to get controller list'.

Other agents which are not tracked in UI "Agent Status" pane, and what happens when they are stopped:

nsx-proxy      ← Interacts with both Policy and CCP on the NSX manager, and nsx-opsagent and nsx-nestdb on host

When stopped, NSX Configuration is listed as "Host Disconnected". Manager connectivity is Down, and Agent Status is not reported.

nsx-sfhc         ← This is the installation agent for NSX deployment that communicates with MP

When stopped, UI shows host with status "Install Failed" and View Details is not available. 

2. Agent Status

 In the 3,2.2+ NSX UI, Host Agent Status for 3 agents is viewed at: System > Fabric > Hosts > View Details on the Host > Monitor > Agent Status

 If one Agent (opsagent, cfgagent, nestdb) is stopped, the Host's overall Status will show as Down

           Kindly check the Agent Using the The REST API to retrieve the status of the Transport Node display

               GET api/v1/transport-nodes/{TRANSPORT_NODE_ID}/status

3. Commands: To manage agent service from ESX command line:   /etc/init.d/<service name> status/stop/start

'esxcli network ip connection list | grep 1234' and 'esxcli network ip connection list | grep 1235' will show connections to the Managers and Controller with World Name of 'nsx-proxy'     ←  If nsx-proxy is stopped, these connections will not be listed

4. Logs: nsx-proxy logs are in nsx-syslog.log. This will show logging for the the nsx-proxy agent connection to the managers.

grep -i "nsx-proxy" /var/run/log/nsx-syslog.log
  • nsx-proxy heartbeats from MP appear like this:
2022-08-11T04:25:56Z nsx-proxy: NSX 2101571 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="mpa-proxy-lib" tid="2101571" level="INFO"] MessagingClientService: Heartbeat message received in FrameworkUnifiedMsg from endpoint: ssl://#.#.#.#:1234 client_id: ####-####-####-####-####
2022-08-11T04:26:56Z nsx-proxy: NSX 2101571 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="mpa-proxy-lib" tid="2101571" level="INFO"] MessagingClientService: Heartbeat message received in FrameworkUnifiedMsg from endpoint: ssl://#.#.#.#:1234 client_id: ####-####-####-####-####
  • To check for gaps in heartbeats from management plane, run:
grep "Heartbeat message" /var/run/log/nsx-syslog* | awk '{print $1}' | cut -d ":" -f3- | sort -V | cut -d":" -f1 | uniq -c
  • There are usually 60 an hour. A high-level look at heartbeat loss looks like this:
   60 2022-08-20T02
   60 2022-08-20T03
   7 2022-08-20T04    ← only 7 heartbeats from MP this hour 
   32 2022-08-25T00   ← only 32 heartbeats from MP this hour 
   60 2022-08-25T01
   60 2022-08-25T02
  • nsx-nestdb logs are in nsx-syslog.log:
grep -i "nestdb" /var/run/log/nsx-syslog.log
  • nsx-cfgagent logs are in nsx-syslog.log:
grep -i "cfgagent" /var/run/log/nsx-syslog.log
  • nsx-opsagent logs are in nsx-syslog.log:
grep -i "opsagent" /var/run/log/nsx-syslog.log
  • If nsxda cannot connect with nestdb, logging in nsx-syslog.log appears like this:
2022-12-01T18:39:50.058Z nsx-opsagent[2103501]: NSX 2103501 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxda" tid="2105253" level="WARNING"] Waiting for NestDB to connect.
2022-12-01T18:39:53.931Z nsx-opsagent[2103501]: NSX 2103501 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxda" tid="2105252" level="WARNING"] Waiting for NestDB to connect.