NSX Manager reports Node Agent is down
search cancel

NSX Manager reports Node Agent is down

book

Article ID: 380524

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Node Agent Alarm is triggered
  • ESXi host where Node Agent is down returns a non-healthy hyperbus status. 

nsxcli -c get hyperbus connection info
VIFID          Connection                       Status              HostSwitchID                  
[UUID]         [IP]:[PORT]                     HEALTHY              [ID] 
[UUID]         [IP]:[PORT]              COMMUNICATION_ERROR         [ID] 
[UUID]         [IP]:[PORT]                     HEALTHY              [ID]

  • The Worker Node may show its Node Agent in a running state.

kubectl get pods -n nsx-system -o wide | <worker node>
nsx-node-agent-[ID]      3/3     Running   [RESTARTS]   [IP]    worker-[ID]
..

  • Node Agent events show the following error.

kubectl describe pod nsx-node-agent-[ID] -n nsx-system
...
Events:
Type     Reason     Age                        From     Message
Warning  Unhealthy  [AGE]                     kubelet  (combined from similar events): Liveness probe errored: rpc error: code = Unknown desc = command error: time=[TIMESTAMP] level=error msg="exec failed: unable to start container process: error starting setns process: fork/exec /proc/self/exe: no such file or directory"
, stdout: , stderr: , exit code -1

Environment

VMware NSX

VMware NSX Container Plugin

Resolution

There's currently no resolution to this issue.

Workaround:

Redeploy the pod by running the following command.

kubectl delete pod nsx-node-agent[ID] -n nsx-system