Edge Node Replacement Blocked by Failure Domain Dependency (Error 15021)
search cancel

Edge Node Replacement Blocked by Failure Domain Dependency (Error 15021)

book

Article ID: 416365

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

When attempting to replace an Edge Node in NSX, the first step is to verify whether the Edge Node is part of an Edge Cluster. If it is, and the cluster has Logical Routers (LRs) running on it, dependencies must be cleared before the node can be replaced at the cluster level.

In some cases, Edge Node replacement may fail with the following error if the node is part of a failure domain:

Found errors in the request. Please refer to the related errors for details. (Error code: 15000) [Fabric] Edge cluster should have transport nodes from at least two failure domains, if failure-domain-based allocation enabled. (Error code: 15021)

Example Scenario

You need to replace Edge-Node-K-02 with a new node Edge-Node-K-03 in NSX.

The Edge Cluster ("Edge-Cluster") consisted of four Edge Nodes:

  • Edge-Node-K-01

  • Edge-Node-N-01

  • Edge-Node-K-02

  • Edge-Node-N-02

This Edge Cluster was assigned to a Tier-0 Gateway in Active/Standby mode:

  • Active: Edge-Node-K-01

  • Standby: Edge-Node-N-01

Two failure domains were configured for this Edge Cluster as per Broadcom Documentation:

  • Failure Domain 1: Edge-Node-K-01, Edge-Node-N-01

  • Failure Domain 2: Edge-Node-K-02, Edge-Node-N-02

You attempted to add Edge-Node-K-03 in the place of Edge-Node-K-02. However, the following error appeared:

Found errors in the request. Please refer to the related errors for details. (Error code: 15000) [Fabric] Edge cluster should have transport nodes from at least two failure domains, if failure-domain-based allocation enabled. (Error code: 15021)



Environment

VMware NSX

Cause

This occurred because, with failure-domain-based allocation enabled, at least one Edge Node must remain in each failure domain during replacement.

Resolution

To replace the Edge Node, the failure-domain-based allocation rule must first be removed from the Edge Cluster.

Step 1: Remove allocation rule from the Edge Cluster

  1. Perform a GET on the Edge Cluster: GET https://<NSXMGRIP>/api/v1/edge-clusters/<edgeClusterId>

  2. In the response payload, remove the allocation rule section: "allocation_rules" : [ { "action" : { "enabled" : true, "action_type" : "AllocationBasedOnFailureDomain" } } ],

  3. Perform a PUT with the updated payload (without the allocation rule): PUT https://<NSXMGRIP>/api/v1/edge-clusters/<edgeClusterId>

Step 2: Remove the old Edge Node from the Edge Cluster

  • This can be done via the NSX Manager UI.

Step 3: Add the new Edge Node to the Edge Cluster

  • Use NSX Manager UI to add Edge-Node-K-03 to the cluster.

Step 4: Re-enable failure-domain-based allocation (optional)

  • Only if the new node is assigned to a distinct failure domain.

Additional Information

Once Edge-Node-K-03 is added, failure-domain-based allocation can only be re-enabled if the Edge Cluster contains nodes across at least two different failure domains.

Otherwise, the feature will remain disabled until the domain diversity is restored