Duplicate TEP Label Used by Transport Nodes Including NSX-T Edge Results in N/S Traffic Impact
search cancel

Duplicate TEP Label Used by Transport Nodes Including NSX-T Edge Results in N/S Traffic Impact

book

Article ID: 318747

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • Configuration changes on Edge from NSX UI show success, however, when Edge CLI is checked for changes, none are reflected
  • The nsxcli command get logical-routers on the edge node, fails to show any SR, DR details.
  • North-South (N-S) Traffic dependent of the Edge/Edge cluster is completely down.
  • Edge Node log /var/log/syslog show TEP Labels not set :

<30>1 2021-03-16T19:55:44.952448+00:00 HOST03 nsxa-systemd-helper 2015 - - 2021-03-16T19:55:44Z nsx-edge-nsxa 19 host_config [ERROR] VTEP X.X.33.139 label not set errorCode="EDG0100130"

<30>1 2021-03-16T19:55:44.952551+00:00 HOST03 nsxa-systemd-helper 2015 - - 2021-03-16T19:55:44Z nsx-edge-nsxa 19 dp [ERROR] Send delta DP config failed errorCode="EDG0100393"

<27>1 2021-03-16T19:55:44.952634+00:00 EDGE03 NSX 19 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-nsxa.host_config" level="ERROR" errorCode="EDG0100130"] VTEP X.X.33.139 label not set

<27>1 2021-03-16T19:55:44.952719+00:00 EDGE03 NSX 19 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-nsxa.dp" level="ERROR" errorCode="EDG0100393"] Send delta DP config failed

  • Edge Node log /var/log/syslog looks to reclaim TEP Label:

syslog:<29>1 2021-03-16T19:55:44.239606+00:00 EDGE03 NSX 19 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-nsxa.dp_database" level="INFO"] reclaim vtep label 106581 from X.X.41.165

syslog:<29>1 2021-03-16T19:58:29.231772+00:00 EDGE03 NSX 19 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-nsxa.dp_database" level="INFO"] reclaim vtep label 106580 from X.X.41.163

syslog:<29>1 2021-03-16T20:05:06.736494+00:00 EDGE03 NSX 19 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-nsxa.dp_database" level="INFO"] reclaim vtep label 106581 from X.X.33.151

syslog:<29>1 2021-03-16T20:05:59.535306+00:00 EDGE03 NSX 19 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-nsxa.dp_database" level="INFO"] reclaim vtep label 106581 from X.X.41.165

syslog:<29>1 2021-03-16T20:13:36.644817+00:00 EDGE03 NSX 19 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-nsxa.dp_database" level="INFO"] reclaim vtep label 106580 from X.X.33.139

syslog:<29>1 2021-03-16T20:13:36.648830+00:00 EDGE03 NSX 19 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-nsxa.dp_database" level="INFO"] reclaim vtep label 106581 from X.X.33.151

  • API To get details of the labels assigned to all transport nodes in NSX T Environment:

GET https://<NSX-VIP-IP>/api/v1/transport-nodes/state

  • In the API result, search for the Labels from reclaim Label output in the log above and check if there are any duplicate assignations of these labels, for example the Labels in question from above reclaim output: 106580 , 106581

Example:

 {
            "transport_node_id": "3xxxxxx1-bxx6-4xx4-9xx4-9xxxxxxxxxxa",
            "host_switch_states": [
                {
                    "host_switch_id": "49 41 c6 4b 77 16 40 ea-96 ad ed a1 30 38 7b 09",
                    "host_switch_name": "OVERLAY-NVDS",
                    "endpoints": [
                        {
                            "device_name": "vmk10",
                            "ip": "X.X.41.163",
                            "default_gateway": "X.X.41.129",
                            "subnet_mask": "255.255.255.128",
                            "label": 106580 ----->>>>>>> Duplicate
                        },
                        {
                            "device_name": "vmk11",
                            "ip": "X.X.41.165",
                            "default_gateway": "X.X.41.129",
                            "subnet_mask": "255.255.255.128",
                            "label": 106581 ----->>>>>>> Duplicate
                        }
                    ],
                    "transport_zone_ids": [
                        "8xxxxxx5-3xxe-4xx1-9xx6-fxxxxxxxxxx0"
                    ]
                }
            ],

     {
            "transport_node_id": "2xxxxxxe-8xx9-4xx2-9xxd-3xxxxxxxxxxe",
            "host_switch_states": [
                {
                    "host_switch_id": "49 41 c6 4b 77 16 40 ea-96 ad ed a1 30 38 7b 09",
                    "host_switch_name": "OVERLAY-NVDS",
                    "endpoints": [
                        {
                            "device_name": "vmk10",
                            "ip": "X.X.33.138",
                            "default_gateway": "X.X.33.129",
                            "subnet_mask": "255.255.255.128",
                            "label": 106661
                        },
                        {
                            "device_name": "vmk11",
                            "ip": "X.X.33.139",
                            "default_gateway": "X.X.33.129",
                            "subnet_mask": "255.255.255.128",
                            "label": 106580 ----->>>>>>> Duplicate
                        }
                    ],
                    "transport_zone_ids": [
                        "8xxxxxx5-3xxe-4xx1-9xx6-fxxxxxxxxxx0"
                    ]
                }
            ],
            "maintenance_mode_state": "DISABLED",
            "node_deployment_state": {
                "state": "success",
                "details": []
            },
            "state": "success"

{
            "transport_node_id": "dxxxxxxc-cxx6-4xxc-9xxf-5xxxxxxxxxxb",
            "host_switch_states": [
                {
                    "host_switch_id": "49 41 c6 4b 77 16 40 ea-96 ad ed a1 30 38 7b 09",
                    "host_switch_name": "OVERLAY-NVDS",
                    "endpoints": [
                        {
                            "device_name": "vmk10",
                            "ip": "X.X.33.150",
                            "default_gateway": "X.X.33.129",
                            "subnet_mask": "255.255.255.128",
                            "label": 106652
                        },
                        {
                            "device_name": "vmk11",
                            "ip": "X.X.33.151",
                            "default_gateway": "X.X.33.129",
                            "subnet_mask": "255.255.255.128",
                            "label": 106581 ----->>>>>>> Duplicate
                        }
                    ],
                    "transport_zone_ids": [
                        "8xxxxxx5-3xxe-4xx1-9xx6-fxxxxxxxxxx0"
                    ]
                }
            ],
            "maintenance_mode_state": "DISABLED",
            "node_deployment_state": {
                "state": "success",
                "details": []
            },
            "state": "success"
        },

  • From above we can determine that there are duplicate labels assigned as below:

106581 is assigned to TNs dxxxxxxc-cxx6-4xxc-9xxf-5xxxxxxxxxxb and 3xxxxxx1-bxx6-4xx4-9xx4-9xxxxxxxxxxa

106580 is assigned to TNs 2xxxxxxe-8xx9-4xx2-9xxd-3xxxxxxxxxxe and 3xxxxxx1-bxx6-4xx4-9xx4-9xxxxxxxxxxa

Environment

VMware NSX-T Data Center

Cause

On a very rare occasion, transport node labels are wrongly assigned L2-Switching VTEP labels that are already in use.  This can impact any version of NSX T from versions 2.5.0 and onwards (until versions mentioned in resolution).

Resolution

This behavior is resolved in VMware NSX-T 3.1.3 and 3.2.0.

Workaround:

1. Get details of the labels assigned with below methods:

API To get details of the Labels assigned to all TNs:
https://NSX_VIP/api/v1/transport-nodes/state

Details of Labels assigned can also be retrieved via below curl command:
curl -k -u 'admin' https://<nsx-manager-ip>/api/v1/transport-nodes/state > tnstates.json

2. In API/curl Output search for the Labels from reclaim Label output and check if there are any duplicate assignations of these labels.

From a Linux shell where you have the output file tnstates.json:


# If you see any output, then there are duplicate labels, in this case there are “2” instances of the labels 106580 and 105681
grep label tnstates.json | sort | uniq -c | grep -v " 1"
     2 "label": 106580
     2 "label": 106581

Labels in question from above reclaim output: 106580, 106581

Example:

 {
            "transport_node_id": "3xxxxxx1-bxx6-4xx4-9xx4-9xxxxxxxxxxa",
            "host_switch_states": [
                {
                    "host_switch_id": "49 41 c6 4b 77 16 40 ea-96 ad ed a1 30 38 7b 09",
                    "host_switch_name": "OVERLAY-NVDS",
                    "endpoints": [
                        {
                            "device_name": "vmk10",
                            "ip": "10.xx.xx.163",
                            "default_gateway": "10.xx.xx.129",
                            "subnet_mask": "255.255.255.128",
                            "label": 106580
                        },
                        {
                            "device_name": "vmk11",
                            "ip": "10.xx.xx.165",
                            "default_gateway": "10.xx.xx.129",
                            "subnet_mask": "255.255.255.128",
                            "label": 106581
                        }
                    ],
                    "transport_zone_ids": [
                        "8xxxxxx5-3xxe-4xx1-9xxxxxxxxxx0"
                    ]
                }
            ],

     {
            "transport_node_id": "2xxxxxxe-8xx9-4xx2-9xxd-3xxxxxxxxxxe",
            "host_switch_states": [
                {
                    "host_switch_id": "49 41 c6 4b 77 16 40 ea-96 ad ed a1 30 38 7b 09",
                    "host_switch_name": "OVERLAY-NVDS",
                    "endpoints": [
                        {
                            "device_name": "vmk10",
                            "ip": "10.xx.xx.138",
                            "default_gateway": "10.xx.xx.129",
                            "subnet_mask": "255.255.255.128",
                            "label": 106661
                        },
                        {
                            "device_name": "vmk11",
                            "ip": "10.xx.xx.139",
                            "default_gateway": "10.xx.xx.129",
                            "subnet_mask": "255.255.255.128",
                            "label": 106580
                        }
                    ],
                    "transport_zone_ids": [
                        "8xxxxxx5-3xxe-4xx1-9xxxxxxxxxx0"
                    ]
                }
            ],
            "maintenance_mode_state": "DISABLED",
            "node_deployment_state": {
                "state": "success",
                "details": []
            },
            "state": "success"

{
            "transport_node_id": "dxxxxxxc-cxx6-4xxc-9xxf-5xxxxxxxxxxb",
            "host_switch_states": [
                {
                    "host_switch_id": "49 41 c6 4b 77 16 40 ea-96 ad ed a1 30 38 7b 09",
                    "host_switch_name": "OVERLAY-NVDS",
                    "endpoints": [
                        {
                            "device_name": "vmk10",
                            "ip": "10.xx.xx.150",
                            "default_gateway": "10.xx.xx.129",
                            "subnet_mask": "255.255.255.128",
                            "label": 106652
                        },
                        {
                            "device_name": "vmk11",
                            "ip": "10.xx.xx.151",
                            "default_gateway": "10.xx.xx.129",
                            "subnet_mask": "255.255.255.128",
                            "label": 106581
                        }
                    ],
                    "transport_zone_ids": [
                        "8xxxxxx5-3xxe-4xx1-9xx6-fxxxxxxxxxx0"
                    ]
                }
            ],
            "maintenance_mode_state": "DISABLED",
            "node_deployment_state": {
                "state": "success",
                "details": []
            },
            "state": "success"
        },

3. From above we can determine that there are duplicate labels assigned as below. We can note the TN UUID :

106581 is assigned to TNs dxxxxxxc-cxx6-4xxc-9xxf-5xxxxxxxxxxb and 3xxxxxx1-bxx6-4xx4-9xx4-9xxxxxxxxxxa
106580 is assigned to TNs2xxxxxxe-8xx9-4xx2-9xxd-3xxxxxxxxxxe and 3xxxxxx1-bxx6-4xx4-9xx4-9xxxxxxxxxxa

4. Create a temporary VLAN transportzone with the same NVDS name as the overlay TZ being assigned to the TN.
5. Evacuate all the Host TN identified in Step 3, that have above labels in question.
6. Edit the TN to remove it from the Overlay TZ and assign it to the vlan TZ and confirm that the TEPs (vmk10 and vmk11) have been removed
7. Note: If there are two hosts that have duplicates, you NEED TO REMEDIATE ALL OF THEM!
8. Repeat the procedure to check to see if they are any more duplicates after remediation.
9. Get the JSON output for API /api/v1/transport-nodes/state once again and check, it should not contain any of the earlier duplicate VTEP labels.
10. Edit the TN and remove the VLAN TZ and add it back to the original overlay TZ, re[eat for all earlier affected TZs one after the other.
11. Get the JSON output for API /api/v1/transport-nodes/state once again, and check, it should not contain any of the earlier duplicate VTEP labels.