Edge Node Removal Blocked by Failure Domain Dependency (Error 15021)
search cancel

Edge Node Removal Blocked by Failure Domain Dependency (Error 15021)

book

Article ID: 412645

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

When attempting to delete an Edge Node in NSX, the first step is to verify whether the Edge Node is part of an Edge Cluster. If it is, and the cluster has Logical Routers (LRs) running on it, dependencies must be cleared before the node can be removed at the cluster level.

In some cases, Edge Node removal may fail with the following error if the node is part of a failure domain:

Found errors in the request. Please refer to the related errors for details. (Error code: 15000)
[Fabric] Edge cluster should have transport nodes from at least two failure domains, if failure-domain-based allocation enabled. (Error code: 15021)

Example Customer Scenario

The customer requirement was to remove Edge-Node-K-02 and Edge-Node-N-02 from NSX.

The Edge Cluster ("Edge-Cluster") consisted of four Edge Nodes:

  • Edge-Node-K-01
  • Edge-Node-N-01
  • Edge-Node-K-02
  • Edge-Node-N-02

This Edge Cluster was assigned to a Tier-0 Gateway in Active/Standby mode:

  • Active: Edge-Node-K-01
  • Standby: Edge-Node-N-01

Two failure domains were configured for this Edge Cluster as per Broadcom Documentation:

  • Failure Domain 1: Edge-Node-K-01, Edge-Node-N-01
  • Failure Domain 2: Edge-Node-K-02, Edge-Node-N-02

The customer successfully removed Edge-Node-K-02. However, when attempting to remove Edge-Node-N-02, the following error appeared:

Found errors in the request. Please refer to the related errors for details. (Error code: 15000)
[Fabric] Edge cluster should have transport nodes from at least two failure domains, if failure-domain-based allocation enabled. (Error code: 15021)

This occurred because, with failure-domain-based allocation enabled, at least one Edge Node must remain in each failure domain.

Environment

VMware NSX

Cause

The error is expected behavior. Both nodes in Failure Domain 2 were:

  • Edge-Node-K-01
  • Edge-Node-N-02

As per NSX requirements, when failure-domain-based allocation is enabled, at least one Edge Node must remain in each configured failure domain. Removing the last node from a failure domain is blocked by design.

Resolution

To remove the Edge Node, the failure-domain-based allocation rule must first be removed from the Edge Cluster.

Step 1: Remove allocation rule from the Edge Cluster

  1. Perform a GET on the Edge Cluster:

    GET
    https://<NSXMGRIP>/api/v1/edge-clusters/<edgeClusterId>

  2. In the response payload, remove the allocation rule section:

    "allocation_rules" : [ {
        "action" : {
          "enabled" : true,
          "action_type" : "AllocationBasedOnFailureDomain"
        }
      } ],
  3. Perform a PUT with the updated payload (without the allocation rule):

    PUT
    https://<NSXMGRIP>/api/v1/edge-clusters/<edgeClusterId>

Step 2: Remove the Edge Node from the Edge Cluster

  • This can be done via the NSX Manager UI.

Step 3: Delete the Edge Transport Node

  • This can also be done via the NSX Manager UI.

Step 4 (Optional): Delete the user-created failure domain

  • If the failure domain is no longer used by any transport nodes, it can be deleted:

    DELETE
    https://<NSXMGRIP>/api/v1/failure-domains/<failureDomainId>

Additional Information

Once Edge-Node-N-02 is removed, it will not be possible to re-enable the failure-domain-based allocation on the Edge Cluster, because the remaining Edge Nodes will belong to the same failure domain. Enabling this feature requires nodes across at least two different failure domains.