ESXi host intermittently goes into "UNKNOWN" status in UI and VMs ports are blocked after vMotion to the "UNKNOWN" host
search cancel

ESXi host intermittently goes into "UNKNOWN" status in UI and VMs ports are blocked after vMotion to the "UNKNOWN" host

book

Article ID: 317784

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

1. ESXi host reports "UNKNOWN" status in NSX UI
   (NSX controller status showed as not available)
2. VMs lose networking after migrating to this host
3. VMs ports could be in a blocked state

A similar message can be seen when Opsagent loses connection to nsx-proxy

2022-03-16T10:02:19Z nsx-proxy: NSX xxxxxxx - [nsx@6xxx comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="3xxxxxxx" level="ERROR" errorCode="RPC31"] RpcConnection[30 Connected on tcp://127.0.0.1:4096 0] Keepalive failed - haven't received response in time (last request was sent 60 seconds ago, response received - never)
2022-03-16T10:02:19Z nsx-proxy: NSX xxxxxxx - [nsx@6xxx comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="3xxxxxxx" level="INFO"] RpcConnection[30 Connected on tcp://127.0.0.1:4096 0] Closing (keepalive expired)



Environment

VMware NSX-T Data Center

Cause

Incorrect handling of very short timeouts in the kernel causes threads in nsx-exporter (in 3.1.x) or opsagent (in 3.2.x) to hang

Resolution

Upgrade ESXi host version to 7.0u3 and above


Workaround:

Run "services.sh restart " on ESXi host or restart the following service:

In NSX-T 3.1.x version: /etc/init.d/nsx-exporter restart
in NSX-T 3.2.x version: /etc/init.d/nsx-opsagent restart

Restarting nsx-exporter and nsx-opsagent causes no dataplane impact


Additional Information

To Check when ESXi hosts are connected to the Controllers, on the ESXi we can look at the nsx-syslog  in /var/run/log
 
grep for "state:CONNECTED master:" in /var/run/log/nsx-syslog.log , below is an example of the log snip 
 
2021-09-09T06:17:49Z nsx-proxy: NSX 55939433 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" tid="########" level="INFO"] Write ccp session message to nestdb ccp_id { xxxxxx-xxxx-xxxx-xxxx-xxxxxxxx } ip { ipv4: x.x.x.x } server_port: 1235 fqdn: "" state: CONNECTED master: true
2021-09-09T06:17:49Z cfgAgent[36414678]: NSX 36414678 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="A9FCF1C0" level="info"] Decoder: Received CCP_SESSION msg (Operation SET): ccp_id { left: xxxxxxxxxxx right: xxxxxxxxxxxxxx } ip { ipv4: x.x.x.x } server_port: 1235 fqdn: state: CONNECTED master: 1


Impact/Risks:
VMs lose networking and ports are blocked after vMotion to the "UNKNOWN" host