vMotion Fails Due to Expired NSX Certificates on Host Transport Nodes
search cancel

vMotion Fails Due to Expired NSX Certificates on Host Transport Nodes

book

Article ID: 409475

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Virtual machine (VM) vMotion operations fail for VMs residing on NSX segments. This issue typically manifests when the underlying ESXi hosts, which are configured as NSX Transport Nodes, transition into a "Failed" state within the NSX-T Manager UI.

  • One of the following errors may be seen:

    "Network interface" 'Network adapter 1' uses network 'DVSwitch[50 19 ## ## ## ## ## ## ## ## 58 97] NSX port group [dvportgroup-0##6](nsxa down)'

    or

    "Network interface" 'Network adapter X' uses network '## ## ## ## ## ## ## ##', which is not accessible.

  • vMotion operations for VMs connected to NSX segments consistently fail.
  • ESXi hosts registered as NSX Transport Nodes appear in a "Failed" state within the NSX-T Manager UI.
  • Error messages in vCenter or during vMotion attempts may indicate communication issues with NSX or the host's inability to participate in NSX operations.

Environment

VMware NSX
VMware NSX-T Data Center

Cause

  • The primary cause is related to expired internal NSX certificates on these host transport nodes, which disrupts communication between the hosts and the NSX management plane.

  • From the host nsx-syslog.log:

  • The issue stems from a design aspect in NSX-T versions 4.1.x and 4.2.0, where the internal certificates used for instantiating Edge and Host Transport Nodes have a validity period of 825 days instead of the typical 10 years seen in other versions or for other components. These "permanent" certificates are not automatically replaced or renewed during NSX upgrades.

Resolution

Run the CARR Script: Using Certificate Analyzer, Results and Recovery (CARR) Script to fix certificate related issues in NSX

Note: Ensure you have appropriate administrative access to the NSX Managers and the hosts, as required by the script's instructions.

Verify Certificate Replacement and Host State:
-  After the script completes, allow some time for NSX to resynchronize with the hosts.
- Monitor the NSX-T Manager UI to confirm that all previously "Failed" host transport nodes return to a "Success" or "Up" state.
- Verify communication between hosts and NSX Manager.

Test vMotion Operations:
- Once all hosts are in a healthy state, attempt vMotion operations for VMs on NSX segments.
- Confirm that vMotion now completes successfully without errors.


The CARR script specifically targets and replaces the expired internal certificates that are causing the communication breakdown between the NSX Managers and the transport nodes. By renewing these certificates, the secure communication channel is re-established, allowing hosts to properly integrate with NSX, clearing their "Failed" status, and enabling dependent operations like vMotion to function correctly.

Additional Information