When attempting to replace an Edge Node in NSX, the first step is to verify whether the Edge Node is part of an Edge Cluster. If it is, and the cluster has Logical Routers (LRs) running on it, dependencies must be cleared before the node can be replaced at the cluster level.
In some cases, Edge Node replacement may fail with the following error if the node is part of a failure domain:
Found errors in the request. Please refer to the related errors for details. (Error code: 15000) [Fabric] Edge cluster should have transport nodes from at least two failure domains, if failure-domain-based allocation enabled. (Error code: 15021)
You need to replace Edge-Node-K-02 with a new node Edge-Node-K-03 in NSX.
The Edge Cluster ("Edge-Cluster") consisted of four Edge Nodes:
Edge-Node-K-01
Edge-Node-N-01
Edge-Node-K-02
Edge-Node-N-02
This Edge Cluster was assigned to a Tier-0 Gateway in Active/Standby mode:
Active: Edge-Node-K-01
Standby: Edge-Node-N-01
Two failure domains were configured for this Edge Cluster as per Broadcom Documentation:
Failure Domain 1: Edge-Node-K-01, Edge-Node-N-01
Failure Domain 2: Edge-Node-K-02, Edge-Node-N-02
You attempted to add Edge-Node-K-03 in the place of Edge-Node-K-02. However, the following error appeared:
Found errors in the request. Please refer to the related errors for details. (Error code: 15000) [Fabric] Edge cluster should have transport nodes from at least two failure domains, if failure-domain-based allocation enabled. (Error code: 15021)
VMware NSX
This occurred because, with failure-domain-based allocation enabled, at least one Edge Node must remain in each failure domain during replacement.
To replace the Edge Node, the failure-domain-based allocation rule must first be removed from the Edge Cluster.
Step 1: Remove allocation rule from the Edge Cluster
Step 2: Remove the old Edge Node from the Edge Cluster
Step 3: Add the new Edge Node to the Edge Cluster
Step 4: Re-enable failure-domain-based allocation (optional)
Once Edge-Node-K-03 is added, failure-domain-based allocation can only be re-enabled if the Edge Cluster contains nodes across at least two different failure domains.
Otherwise, the feature will remain disabled until the domain diversity is restored