After ESX host reboot, the Transport Node status is partially successful and won't succeed
search cancel

After ESX host reboot, the Transport Node status is partially successful and won't succeed

book

Article ID: 330395

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • In rare cases after an ESXi host reboot, the host shows as "Partial Success" in the NSX UI under Fabric -> Host -> Node
  • You see messages similar to the following in /var/log/nsx-syslog on the ESXi host:

    2024-03-13T07:13:55.427Z nsx-opsagent[2102926]: NSX 2102926 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="2103675" level="WARNING"] VDRVMK_GetVMAC(<host switch UUID>) failed with code -1

  • Example output in the desired_state_manager.json file in an NSX Manager log bundle, under section  /nsxapi/api/v1/transport-nodes/state:

     {
         "details": [
           {
              "failure_code": 8804,
              "failure_message": " Host configuration: VDRVMK_GetVMAC(<host switch UUID>) failed with code -1; LogicalSwitch full-sync: LogicalSwitch full-sync realization query skipped.",
              "state": "partial_success",
              "sub_system_id": "<UUID>",
               "sub_system_type": "Host"
            }
      ],

Cause

After an ESXi host reboot, the nsxa app restarts and reconnects with the VDR. Part of the appInit process checks if the VDR MAC addresses have changed. In a rare occurrence, the VDRVMK_GetVMAC can return with errno = ENOENT which is treated as a failure and causes the host to enter the Partial Success state.

Resolution

This issue is resolved in VMware NSX 3.2.4
This issue is resolved in VMware NSX 4.2.0

Workaround:

  1. Restart nsx agent on the affected ESXi:

    On NSX 3.2 or later releases:

    /etc/init.d/nsx-opsagent stop
    /etc/init.d/nsx-opsagent start

    For verification if the nsx-opsagent is running, please use the below command

    /etc/init.d/nsx-opsagent status

    On NSX 3.1 and previous releases:

    /etc/init.d/nsxa stop
    /etc/init.d/nsxa start

    For verification if the nsx-opsagent is running, please use the below command

    /etc/init.d/nsxa status

  2. You can alternatively reboot the affected host.

The probability of re-occurrence is extremely low.

Additional Information

Impact/Risks:
If the ESXi host is taken out of Maintenance Mode in the Partial Success state, and VMs are then vMotioned to this host, those VMs will lose connectivity with machines on other networks.