NSX prepared host shows "Unknown" status after host reboot
search cancel

NSX prepared host shows "Unknown" status after host reboot

book

Article ID: 381319

calendar_today

Updated On:

Products

VMware NSX VMware NSX Networking

Issue/Introduction

  • NSX was recently upgraded to 4.2.x.
  • ESXi host is rebooted (e.g., following vSphere ESXi host version upgrade or standard maintenance).
  • After the reboot, the NSX nestdb agent remains in a Stopped or Down state.
  • Controller connectivity shows Unknown state in the NSX Manager UI.
  • VMs residing on the rebooted host lose network connectivity.
  • The command run on the affected Transport node CLI with root mode login 'nsxcli -c get controllers'  returns  % Failed to get controller list
  • In System > Fabric > Host, the host status shows as Unknown
  • In NSX Manager UI 'System > Fabric > Hosts > [Hostname] > View Details > Monitor > Agent Status' shows NSX_NESTDB is Down.
  • The following sequence is observed in /var/run/log/syslog during the boot process:
    In(14) jumpstart[2099923]: executing start plugin: nsx-pre-nestdb
    In(14) jumpstart[2099923]: executing start plugin: nsx-nestdb
    In(14) jumpstart[2099923]: nsx-nestdb started.
    In(30) NSX[2101624]: nsx-pre-nestdb started
    In sequence:
    1. Starting nsx-pre-nestdb.
    2. Starting nsx-nestdb (but nsx-pre-nestdb is not fully started yet).
    3. nsx-nestdb completes starting sequence (before nsx-pre-nestdb fully starts).
    4. nsx-pre-nestdb completes starting sequence. We are in the situation where nestdb did not start correctly.
  • nsx-nestdb shows as stopped on host:
    [root@esx:~] /etc/init.d/nsx-nestdb status
    stopped
  • There are no nsx-nestdb core dumps or other logging that indicates that nsx-nestdb has crashed.
  • The Issue is intermittent. For some reboots, the nsx-nestdb starts without issue.

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX 4.2.x

Cause

The issue is triggered by a host reboot. During the boot sequence, the esxcfg-info command within the nsx-pre-nestdb script takes longer than expected to complete. This delay allows the nsx-nestdb service to initialize and finish its startup sequence before its prerequisites are fully established, resulting in the service failing to start correctly.

Resolution

This issue is resolved in VMware NSX 4.2.1.1 and higher. See Download Broadcom products and software for steps to download this release. 

Workaround:

If an upgrade is not immediately possible, manually start the service after the host has finished booting:

  1. Log in to the affected ESXi host via SSH with root mode.
  2. Execute the following command to start the service: /etc/init.d/nsx-nestdb start
  3. Confirm the service status: /etc/init.d/nsx-nestdb status The expected status is: NSX-NESTDB is running.