Load balance/rebalance existing T1 gateways in the edge cluster in NSX enviroment

Products

VMware NSX

Issue/Introduction

In an NSX environment, an edge cluster contains "N" number of edge nodes with T1 gateways present on those edge nodes.
If additional edge nodes are later added to the cluster, any new T1 gateways will be placed on the new nodes. However, the existing T1 gateways will not automatically rebalance to the newly added nodes.
To manually trigger a failover of a specific T1 gateway to another edge node, you can use NSX API.

Environment

VMware NSX

Resolution

Step 1: Execute below state API to retrieve existing allocated edge nodes for any T1 gateway. In response you will see two allocated node with current HA status. Please check edge_path and high_availability_status attributes

GET https://<nsx-mgr>/policy/api/v1/infra/tier-1s/<id>/state

output ... ...

"tier1_status": {
"logical_router_id": "########-caa9-404e-aa75-2864f4######",
"last_update_timestamp": 1743995398096,
"per_node_status": [
{
"transport_node_id": "########-9901-4d62-a30f-514601######",
"edge_path": "/infra/sites/default/enforcement-points/default/edge-clusters/edge-cluster-uuid/edge-nodes/edge-uuid",
"service_router_id": "########-6ff0-4d2a-a91f-1f0dd8######",
"high_availability_status": "STANDBY",
"is_default_sub_cluster": false
},
{
"transport_node_id": "########-f154-42ab-8878-ea71f1######",
"edge_path": "/infra/sites/default/enforcement-points/default/edge-clusters/edge-cluster-uuid/edge-nodes/edge-uuid",
"service_router_id": "########-6ff0-4d2a-a91f-1f0dd8######",
"high_availability_status": "ACTIVE",
"is_default_sub_cluster": false
}
]
},

... ...

Step 2: API to get locale-services/ and manually pass two preferred edge paths

A. GET https://<nsx-mgr>/policy/api/v1/infra/tier-1s/<id>/locale-services/

{
"results": [
{
"edge_cluster_path": "/infra/sites/default/enforcement-points/default/edge-clusters/edge-cluster-uuid",
"resource_type": "LocaleServices",
"############84f4d-a48b-4d3b-9349-c41795c1fd########",
"display_name": "############a48b-4d3b-9349-c41795c1fd########",
"path": "/infra/tier-1s/T1-uuid/locale-services/#######4f4d-a48b-4d3b-9349-c41795c#####", <<<<<<<<<<<< note down this Path for tier-1 which is locale-services id for step b
"relative_path": "############a48b-4d3b-9349-c41795c1fd########",
"parent_path": "/infra/tier-1s/############a48b-4d3b-9349-c41795c1########",

..........

}

Now execute PUT API on tier1 gateway locale service and manually pass two preferred edge paths which you received from first API. First pass active node path and second standby node path. This operation will not trigger any disruption as eventually we are assigning same two edge nodes and also in same order of ACTIVE & STANDBY. Eventually your T1 will be updated as manual allocation. (Note: In case you are getting principal identity issue because of user error, please pass X-Allow-Overwrite=true in HEADER while executing PUT APIs)

B. PUT https://<nsx-mgr>/policy/api/v1/infra/tier-1s/T1-uuid/locale-services/#######4f4d-a48b-4d3b-9349-c41795c#####

body
- { "preferred_edge_paths": ["/infra/sites/default/enforcement-points/default/edge-clusters/edge-cluster-uuid/edge-nodes/edge-uuid","/infra/sites/default/enforcement-points/default/edge-clusters/edge-cluster-uuid/edge-nodes/edge-uuid"] }

Here is the sample below to add "preferred_edge_paths" to the API output obtained from the above GET API

{
"results": [
{
"edge_cluster_path": "/infra/sites/default/enforcement-points/default/edge-clusters/########-1fdb-4fe7-8294-18dd########",
"preferred_edge_paths" : ["/infra/sites/default/enforcement-points/default/edge-clusters/########-1fdb-4fe7-8294-18dd17######/edge-nodes/3","/infra/sites/default/enforcement-points/default/edge-clusters/########-1fdb-4fe7-8294-18dd17######/edge-nodes/2"], <<<<<Add this line and modify the edge-clusters id and edge-nodes id(Edge-nodes id can be seen from the setp-1 API) accordingly
"resource_type": "LocaleServices",
"id": "################-4d3b-9349-c41795c1fd########",
"display_name": "#################4d3b-9349-c41795c1fd########",
"path": "/infra/tier-1s/############a48b-4d3b-9349-c41795c1########/locale-services/#######4f4d-a48b-4d3b-9349-c41795c#####",
"relative_path": "############a48b-4d3b-9349-c41795c1fd########",
"parent_path": "/infra/tier-1s/############a48b-4d3b-9349-c41795##########",

..........
}

C. Verify realization status and check whether its SUCCESS or not, proceed to next step once realization has finished.

GET https://{{nsx-mgr}}/policy/api/v1/infra/realized-state/status?intent_path=/infra/tier-1s/<id>

Step 3: Now execute same above API's again and clear preferred_edge_nodes from payload. It will trigger algorithm again and will find least allocated nodes and in this case it will obviously go on new edge nodes which you have added in cluster.

Note: Remember, it will be disruptive operation and you will see datapath impact

GET https://<nsx-mgr>/policy/api/v1/infra/tier-1s/<id>/locale-services/<id>
PUT https://<nsx-mgr>/policy/api/v1/infra/tier-1s/<id>/locale-services/<id>

{
"preferred_edge_paths": [] >>>>>>>>> remove preferred_edge_paths attribute from payload this time and keep rest of payload as is
}

Here is the sample below

{
"edge_cluster_path": "/infra/sites/default/enforcement-points/default/edge-clusters/########-1fdb-4fe7-8294-18dd########",
"resource_type": "LocaleServices",
"id": "############019b-4d38-8160-ac865e208a########",
"display_name": "############019b-4d38-8160-ac865e208a########",
"path": "/infra/tier-1s/############019b-4d38-8160-ac865e20########/locale-services/############019b-4d38-8160-ac865e208a########",
"relative_path": "############019b-4d38-8160-ac865e208a########",
"parent_path": "/infra/tier-1s/############019b-4d38-8160-ac865e20########",
"remote_path": "",
"unique_id": "#########531f-4388-a81d-d71a########",

}