ESXi host status Unknown: NSX UI opsAgent crash
search cancel

ESXi host status Unknown: NSX UI opsAgent crash

book

Article ID: 411733

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • In the NSX Manager UI, System > Fabric > Hosts, when clicking on the Unknown status, Status details indicate Controller Connectivity as Unknown, Tunnel Status as Unknown, and PNIC/Bond Status as Unknown.

    "Controller Connectivity: Unknown", "Tunnel Status: Unknown", "PNIC/Bond Status: Unknown"


  • Affected ESXi host(s) create opsAgent core dump file(s): /var/core/opsAgent-zdump.###
  • During heavy load, multiple ESXi hosts may be affected in batches.
  • In the ESX log, /var/run/log/nsx-syslog.log, log messages similar to the following example are observed

    Log sample from /var/run/log/nsx-syslog.log
    Wa(###) nsx-opsagent[#######]: NSX ######## - [nsx@4413 comp="nsx-esx" subcomp="opsagent" s2comp="nsx-core" tid="########" level="WARNING"] AsioNsxProvider: (0x##########-2) user dispatcher blocked since 20 seconds

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware VCF NSX 9.x
VMware NSX 4.x
VMware NSX-T Data Center 3.x

Cause

This problem occurs because NSX is requesting high quality entropy for frequent use and consequently the default HWRNG entropy gets exhausted and thereby causes timeouts as we wait for more entropy. Reference ESXi documentation: Controlling ESXi Entropy.

Resolution

This is a known issue impacting VMware NSX.

Workaround:

  • To temporarily recover the connection on the ESXi host, ssh to ESXi as root and run the commands in this order:
    /etc/init.d/nsx-opsagent stop
    /etc/init.d/nsx-cfgagent stop
    /etc/init.d/nsx-proxy stop
    /etc/init.d/nsx-nestdb stop
    /etc/init.d/nsx-pre-nestdb stop

    /etc/init.d/nsx-pre-nestdb start
    /etc/init.d/nsx-nestdb start
    /etc/init.d/nsx-proxy start
    /etc/init.d/nsx-cfgagent
    /etc/init.d/nsx-opsagent
  • Longer term to reduce the probability of the issue reoccurring:

    • Inspect the sources configured to provide entropy:
      esxcli system settings kernel list -o entropySources
    • Enable additional sources of entropy.
      esxcli system settings kernel set -s entropySources -v 0xF 
      Note: A reboot is required for the ESXi host to use the adjusted configuration.


Workaround Reversal:

  • When a fixed version is released and post upgrade to it, the workaround should be reverted as follows:

    esxcli system settings kernel set -s entropySources -v 0

    Note:
     A reboot is required for the ESXi host to use the adjusted configuration.

Additional Information

  • The restarting of the service will likely not remediate the issue by themselves.
  • The two esxcli system settings kernel commands will correct the issue.