NSX-T T1 instances unexpectedly fail over between Edge nodes
search cancel

NSX-T T1 instances unexpectedly fail over between Edge nodes

book

Article ID: 336799

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • NSX-T Data Center 3.1.x
  • Logical routers unexpectedly fail over between the Edge nodes
  • Edge syslog (/var/log/syslog) reports dpdk_panic:

2021-09-06T08:22:14.205Z edge01.example.com NSX 3723 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="intel-rte" tname="dp-ipc31" level="FATAL"] PANIC in dpdk_panic():

2021-09-06T08:22:14.205Z edge01.example.com NSX 3723 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="intel-rte" tname="dp-ipc31" level="FATAL"] assert failed

  • Core dump file is generated:

2021-09-06T08:22:14.248Z edge01.example.com NSX 3723 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="intel-rte" tname="dp-ipc31" level="WARN"] 4: [/opt/vmware/nsx-edge/sbin/datapathd(+0x5223c5) [0x17fcc92573c5]]

2021-09-06T08:22:14.248Z edge01.example.com NSX 3723 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="intel-rte" tname="dp-ipc31" level="WARN"] 3: [/opt/vmware/nsx-edge/sbin/datapathd(+0x5acd90) [0x17fcc92e1d90]]

2021-09-06T08:22:14.248Z edge01.example.com NSX 3723 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="intel-rte" tname="dp-ipc31" level="WARN"] 2: [/usr/lib/librte_eal.so.7.1(__rte_panic+0xbd) [0x7a86731a58c9]]

2021-09-06T08:22:14.248Z edge01.example.com NSX 3723 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="intel-rte" tname="dp-ipc31" level="WARN"] 1: [/usr/lib/librte_eal.so.7.1(rte_dump_stack+0x2e) [0x7a86731b247e]]

2021-09-06T10:22:14.260140+02:00 edge01.example.com NSX 23744 LOAD-BALANCER [nsx@6876 comp="nsx-edge" subcomp="lb" s2comp="access" level="INFO"] [########-####-####-####-########11d3][(null)] unix: - - [06/Sep/2021:08:22:14 +0000] "GET /lb_table?source=status HTTP/1.1" 200 545 "-" "curl/7.58.0"

2021-09-06T08:22:14.288Z edge01.example.com NSX 12438 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.dp-ipc31.1630916534.3723.0.6.gz

 

Environment

VMware NSX-T Data Center 3.x
VMware NSX-T Data Center

Cause

This issue occurs in extremely busy environments, where T1 instances are being created/deleted very frequently, and when the following conditions are met:
  1. Logical router is being deleted.
  2. Internal counter on the Edge node is set to a non-zero value.

Resolution

This issue is resolved in NSX-T Data Center 3.2.0.1, available at Broadcom Downloads.

Workaround:
There is no workaround for this issue.

Additional Information

Impact/Risks:
Unexpected failover of the datapath between the Edge nodes.