ESXi host intermittently displays "UNKNOWN" status in UI and VMs ports are blocked after vMotion to the "UNKNOWN" host
search cancel

ESXi host intermittently displays "UNKNOWN" status in UI and VMs ports are blocked after vMotion to the "UNKNOWN" host

book

Article ID: 317784

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • ESXi host reports "UNKNOWN" status in NSX UI  (NSX controller status displayed as "Not Available").
  • VMs lose networking after migrating to this host.
  • VMs ports could be in a blocked state.
  • A message similar to the below can be seen in nsx-syslog (Opsagent loses connection to nsx-proxy):
    nsx-proxy: NSX ######## - [nsx@6xxx comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="3########" level="ERROR" errorCode="RPC31"] RpcConnection[30 Connected on tcp://127.0.0.1:4096 0] Keepalive failed - haven't received response in time (last request was sent 60 seconds ago, response received - never)
    nsx-proxy: NSX ######## - [nsx@6xxx comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="3########" level="INFO"] RpcConnection[30 Connected on tcp://127.#.#.#:4096 0] Closing (keepalive expired)
  • You see CLOSED tcp session to 127.#.#.#:4096 in esxcli network ip connection list:
    tcp      8890       0  127.#.#.#:11047     127.#.#.#:4096        CLOSED        2101765  newreno  nsx-exporter

Environment

VMware NSX-T Data Center

Cause

Incorrect handling of very short timeouts in the kernel causes threads in nsx-exporter (in NSX-T 3.1.x) or opsagent (in NSX-T 3.2.x) to hang.

Resolution

This is a known issue impacting VMware NSX. 

To resolve, upgrade ESXi host version to 7.0u3 and above.


Workaround

Run ESX CLI command services.sh restart on host or restart the following service:

NSX-T 3.1.x version: /etc/init.d/nsx-exporter restart
NSX-T 3.2.x version: /etc/init.d/nsx-opsagent restart

NB:  Restarting nsx-exporter and nsx-opsagent causes no dataplane impact

Additional Information

To check when ESXi hosts are connected to the Controllers from the host:
grep -i 'state: CONNECTED master' /var/run/log/nsx-syslog.log
nsx-proxy: NSX 55939433 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" tid="########" level="INFO"] Write ccp session message to nestdb ccp_id { ######-####-####-####-######## } ip { ipv4: #.#.#.# } server_port: 1235 fqdn: "" state: CONNECTED master: true
2021-09-09T06:17:49Z cfgAgent[36414678]: NSX 36414678 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" tid="A9FCF1C0" level="info"] Decoder: Received CCP_SESSION msg (Operation SET): ccp_id { left: ########### right: ############## } ip { ipv4: #.#.#.# } server_port: 1235 fqdn: state: CONNECTED master: 1
 
Impact/Risks
VMs lose networking and ports are blocked after vMotion to the "UNKNOWN" host