One or multiple ESXi Transport Node show "Unknown" Node status in the NSX-T Manager UI
search cancel

One or multiple ESXi Transport Node show "Unknown" Node status in the NSX-T Manager UI

book

Article ID: 318308

calendar_today

Updated On:

Products

VMware NSX-T Data Center

Issue/Introduction

  • NSX-T version 3.1.0 or 3.1.1.
  • ESXi version 7.0 Update 2 or above.
  • One or multiple ESXi Transport Node show "Unknown" Node status in the NSX-T Manager UI.
  • Controller and manager connectivity is good between the host transport node and MP.
  • ESXi host logs (nsx-syslog) display message(s) similar to:
  • Log file path /var/run/log/nsx-syslog.log

2021-05-23T02:40:03Z nsx-sha: NSX 2104585 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="WARNING" invalid="true"] Exit SHA process as continuously encountering OSError - [Errno28] No space left on device, trace:Traceback (most recent call last):   File "/usr/lib/vmware/netopa/lib/python/sha/contrib/metric/utils/_command.py", line 33, in run_command     output = ForkServer.check_output(   File usr/lib/vmware/netopa/lib/python/sha/forkserver/_fork_server.py", line 871, in check_output     raise e OSError: [Errno 28] No space left on device ^@

  • There is no dataplane impact observed but the "Unknown" Node status may prevent upgrades due to health checks failing.



Environment

VMware NSX-T Data Center

 

Cause

This issue is caused by a memory leak in the SHA (System Health Agent) process on the ESXi host which is used to report information to the NSX Manager such as NSX services status, hyperbus status, uplink status etc. As a result when the SHA service stops running due to the memory leak, the ESXi host status will be shown as Unknown in the NSX Manager UI and other status report to the NSX Manager will fail. This issue does not impact the dataplane but only the reporting of the ESXi to the Manager.

Resolution

This issue is resolved in NSX-T 3.1.2, available at Support Documents and Downloads.

Workaround:
To workaround the issue you can restart the netopa service on the ESXi host using the following command, note that this is only a temporary workaround and the issue will occur again:
#/etc/init.d/netopad restart