Post upgraded from NSX 3.0.x to 3.2.2, restarting of edge node result in data plane service crash, core dumps generated
search cancel

Post upgraded from NSX 3.0.x to 3.2.2, restarting of edge node result in data plane service crash, core dumps generated

book

Article ID: 323405

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

  • Edge Node uplinks configured with QoS Profile in Untrusted Mode i.e. Priority set as 0 and CoS set as 0.
  • NSX upgrade from 3.0.x to 3.2.2,
  • Post restart of edge node which has uplinks configured with QoS profile, the data plane service crashes constantly and generates core files /var/log/core/core.statsXX.XXXXXXXXXX.XXXX.X.XX.gz
  • Data-plane service remains down.
  • vmkernel logs of the host where the respective edge node is residing shows the "Port_Enable()" fails for port associated with the edge node. The errors would be as follows 

    vmkernel.1.gz:2023-11-29T06:13:40.715Z cpu32:390162820)WARNING: NetDVS: 2413: failed to init client for data com.vmware.net.vxlan.traffic.marking on port e3063f1b-####-####-####-3ca7139f13df
    vmkernel.1.gz:2023-11-29T06:13:40.718Z cpu32:390162820)WARNING: NetPort: 1371: failed to enable port 0x4000051: Bad parameter

  • Via the net-dvs command, you see output similar to the following:

    com.vmware.net.vxlan.traffic.marking = 0x 0. ff. 0. 0 <==== Value is set as FF ( i.e 255 )
    if dscpTag > 63 in untrusted mode(0) or cosTag is 0xff or > 7, then the set call would fail for this property and VMK_BAD_PARAM would be returned.

Environment

VMware NSX
VMware NSX-T Data Center 3.x

Cause

When the cfgAgent observes that dscp_value is unset, it defaults to 255. This action triggers a datapath crash due to the unexpected value.

Resolution

This issue is resolved in VMware NSX 3.2.4
This issue is resolved in VMware NSX 4.2.0

Workaround:
There are 2 possibilities to apply the workaround:

  1. If you have already upgraded to 3.2.2 and observed the issue:
    1. You can update the QoS Profile associated with the Segment and change the value of priority to 1 and then revert it to 0.
    2. You can create a new QoS Profile with priority set to 0 and associate this profile with the Segments that are used as uplinks to Edges.
  2. If you have not yet upgraded to 3.2.2 you can unset/remove the QoS profile associated with the segments that are used as uplinks to Edges and upgrade.

Note: You can also upgrade with the existing configuration but you must implement the workaround #1 above immediately after the Manager upgrade. You must ensure that the edge node does not reboot after the manager upgrade.

Additional Information

Impact/Risks:
Data plane service crashes constantly and generating core files
/var/log/core/core.statsXX.XXXXXXXXXX.XXXX.X.XX.gz