NSX Edges and Tier 0 gateway display as down on NSX version 3.2.1.2
search cancel

NSX Edges and Tier 0 gateway display as down on NSX version 3.2.1.2

book

Article ID: 312628

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Upgraded NSX from 3.1.3.7 to 3.2.1.2 or running version 3.2.1.2.
  • Observed that the PNIC/Bond status on the T0 gateway is down.
  • The overall status of the Edge node is down.
  • There is a disruption in North-South traffic.
  • Seeing similar entries in the Edge Node log file /var/log/syslog.
2022-10-21T11:46:04.913Z hostname.example.com NSX 23204 ROUTING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="lrouter" tname="dp-ipc19" level="INFO"] Update lpm tables: DR (########-####-####-####-########4210), v4: 0x7baxxxxx380, v6: (nil)

2022-10-21T11:46:04.913Z hostname.example.com NSX 23204 ROUTING [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="lrouter" tname="dp-ipc19" level="INFO"] Update lpm tables: DR (########-####-####-####-########4210), v4: 0x7baxxxxx380, v6: 0x7baxxxxx080

2022-10-21T11:46:04.662Z hostname.example.com kernel - - - [10425.332018] grsec: Segmentation fault occurred at (nil) in /opt/vmware/nsx-edge/sbin/datapathd[dp-ipc19:23607] uid/euid:0/0 gid/egid:124/124, parent /usr/bin/containerd-shim-runc-v2[containerd-shim:23181] uid/euid:0/0 gid/egid:0/0



Environment

VMware NSX-T Data Center

Cause

Bad prefixes are being advertised via route advertisement rule as aggregate CIDR from the T1 to the T0 gateway. The user has given invalid network prefix but the problem is missing validation check on Policy/Provider.
 

Resolution

This issue is fixed in NSX version 3.2.3.

Workaround:

  1. Enable Debug Level Logging:

    • On the affected edge, enable debug-level logging for the dataplane.
    • This will allow to capture detailed logs related to the segmentation fault and identify the specific prefix causing the crash.
  2. Identify the Faulting Prefix:

    • After the crash occurs,  the debug logs needs to be reviewed to pinpoint the exact prefix which is causing the segmentation fault in the dataplane.
  3. Search the Prefix in Elastic Search:

    • Once the prefix is identified, search for it in the Elastic Search within the NSX UI to locate which T1 router is advertising it.
  4. Stop the Advertisement or Detach the T1:

    • If a T1 router is advertising the prefix, stop the advertisement of that prefix on the T1.
    • Alternatively, T1 router can be detached from the T0 router to stop the prefix advertisement.