Host Transport Nodes are unexpectedly unprepared / re-prepared for NSX

Products

VMware NSX

Issue/Introduction

Host Transport Nodes (ESXi hosts) prepared for NSX are unexpectedly unprepared for NSX.
The impacted hosts were recently moved out of the vSphere cluster, and returned back.
Impacted ESXi hosts are spontaneously beginning the un-preparation process, without user's intervention.
The impacted hosts may remain in "Install Failed" state with sub-status "Removing host status" until the host is put to maintenance mode, when the re-installation begins.
In NSX cm-inventory logs, a prompt for the host to leave the cluster is seen:
var/log/cm-inventory/cm-inventory.log:
2025-08-04T08:30:34.123Z INFO InventoryFetcher-########-####-####-####-########b5a4 InventoryFetcher 5098 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="cm-inventory"] Processing HostSystem update for id <host-MoRef> of kind leave.
(Unexpected) host deletion can be observed in NSX Manager's syslog:
var/log/syslog.1:
2025-08-04T08:31:34.429Z <nsx-manager> NSX 5227 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Updating the DeploymentProgress from: DeploymentProgress [ id=########-####-####-####-########4149, deploymentType=HOST_TN, operationType=DELETE, progress=40, stateDescription=deployment.progress.fn.start_delete, removeNsxFlag=false] to DeploymentProgress [ id=########-####-####-####-########4149, deploymentType=HOST_TN, operationType=DELETE, progress=40, stateDescription=deployment.progress.fn.start_delete, removeNsxFlag=false]
vCenter's logging (journalctl_-b--0.txt) indicates move of the host out of cluster:
Aug 11 02:17:37 <vCenter_name> vpxd[5922]: Event [617790346] [1-1] [2025-08-11T00:17:37.072269Z] [vim.event.ExtendedEvent] [warning] [com.vmware.vcIntegrity] [####] [617790346] [Membership of the host <ESXi_host_name> has changed on the target cluster.]
journalctl_-b--0.txt.FRAG-00642:
Aug 11 02:36:01 <vCenter_name> vpxd[5922]: Event [617792827] [1-1] [2025-08-11T00:36:01.721141Z] [vim.event.ExtendedEvent] [warning] [com.vmware.vcIntegrity] [####] [617792827] [Membership of the host <ESXi_host_name> has changed on the target cluster.]
Compute Manager connection to NSX is healthy.
This issue is more likely to manifest in environment used to run containerized workloads.

Environment

vSphere CSI driver version 3.3.0
VMware NSX
NSX Container Plugin (NCP)
TKGI 1.20.0

Cause

This issue is caused by an abnormally high number of pending notifications in the vCenter internal messaging service (NotifyQueueSize).

When the vCenter notification queue becomes extremely large (e.g., several million pending notifications), inventory updates—such as a host momentarily moving out of a cluster—are processed by vCenter views with a significant delay. Because of this backlog, the NSX Manager may receive a "host leave" notification hours after the actual event occurred, triggering the un-preparation workflow even if the host has already been moved back or the state has changed.

Resolution

This is a condition that may occur in a VMware NSX environment due to the behaviour of vSphere Container Storage Plug-in component.

To workaround this issue, you must flush the pending vCenter notifications to allow inventory updates to resume in real-time:

SSH to the vCenter server and change the shell to "shell".
Restart the vpxd service:
service-control --restart vmware-vpxd
Ref. KB article: Stopping, Starting or Restarting VMware vCenter Server Appliance services.
Alternatively, you can restart the vCenter.