NSX UI displays 'Unknown' status for NSX host transport nodes
search cancel

NSX UI displays 'Unknown' status for NSX host transport nodes

book

Article ID: 318308

calendar_today

Updated On:

Products

VMware NSX-T Data Center

Issue/Introduction

  • NSX-T version is 3.1.0 or 3.1.1.
  • ESXi version is 7.0 Update 2 or above.
  • One or multiple ESXi Transport Node show "Unknown" Node status in the NSX Manager UI.
  • No issues observed in controller (get controllers) or manager (get managers) connectivity between the host transport node and the managment plane.
  • ESXi host log entries similar to the below are observed:
    /var/run/log/nsx-syslog.log
  • <TIMESTAMP> nsx-sha: NSX 2104585 - [nsx@6876 comp="nsx-esx" subcomp="nsx-sha" username="root" level="WARNING" invalid="true"] Exit SHA process as continuously encountering OSError - [Errno28] No space left on device, trace:Traceback (most recent call last):   File "/usr/lib/vmware/netopa/lib/python/sha/contrib/metric/utils/_command.py", line 33, in run_command     output = ForkServer.check_output(   File usr/lib/vmware/netopa/lib/python/sha/forkserver/_fork_server.py", line 871, in check_output     raise e OSError: [Errno 28] No space left on device ^@
  • "Unknown" Node status may prevent upgrades due to health checks failing.
  • There is no dataplane impact observed.

Environment

VMware NSX-T Data Center

Cause

This issue is caused by a memory leak in the SHA (System Health Agent) process on the ESXi host which is used to report information to the NSX Manager such as NSX services status, hyperbus status, uplink status etc.
 
As a result when the SHA service stops running due to the memory leak, the ESXi host status will be shown as Unknown in the NSX Manager UI and other status report to the NSX Manager will fail. This issue impacts ESXi to NSX Manager repoting only.

Resolution

This issue is resolved in This issue is resolved in VMware NSX-T Data Center 3.1.2, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

 

Workaround

To workaround the issue you can restart the netopa service on the ESXi host using the following command:
/etc/init.d/netopad restart

NB:  This is a temporary workaround and the issue may reoccur.