L7/IDPS Traffic disruption on SCRX/Turbo-enabled hosts due to vmxnet3 TX queue entering stopped state
search cancel

L7/IDPS Traffic disruption on SCRX/Turbo-enabled hosts due to vmxnet3 TX queue entering stopped state

book

Article ID: 440092

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

In NSX/vDefend environments where SCRX/Turbo mode is enabled, traffic inspected by L7 or Distributed IDS/IPS may experience disruption if a vmxnet3 transmit queue on the infravisor pod enters a stopped state.

Symptoms:
On an affected ESXi host, one or more of the following symptoms may be observed:

  • Traffic inspected by SCRX/Turbo may stop passing.
  • L7 or Distributed IDS/IPS traffic may be impacted.
  • The issue is seen only when SCRX/Turbo mode is deployed.
  • Restarting the NSX-SCX service temporarily restores connectivity.
  • The issue may reoccur if another malformed packet or invalid packet metadata condition is encountered.


Validation:

  • The infravisor pod interface can be identified using:
net-stats -l | grep infravisor-pod
 
Example:
[root@ESX:] net-stats -l | grep infravisor-pod
100663326 5 9 DvsPortset-0 00:0c:29:d3:22:fb infravisor-pod.eth0

In this example:

Portset: DvsPortset-0
Port ID: 100663326
Interface: infravisor-pod.eth0

 

  • The vmxnet3 interface may have up to four TX queues. One of these queues may enter a stopped state. Modify following in the command <DvsPortset-X>,<Port ID>,<Tx_queue_number> .
  • Tx_queue_number value can range from 0 to 3.
vsish -e get /net/portsets/<DvsPortset-X>/ports/<Port ID>/vmxnet3/txqueues/<Tx_queue_number>/status

Example:

[root@ESX:] vsish -e get /net/portsets/DvsPortset-0/ports/100663326/vmxnet3/txqueues/0/status
status of a vmxnet3 vNIC tx queue {
intr index:1
stopped:1 <<<<<< This is the key indicator of the issue
error code:2147483655
next2Tx:1895
next2Comp:1895
ring size:2048
data ring desc size:128
ts ring desc size:0
genCount:0
next2Write:1894
next2Tx from timeout:65535
next2Comp from timeout:65535
timestamp in milliseconds in check:0
}

Environment

This issue applies to environments with:

  • VMware vDefend / NSX versions 4.2.4 & earlier or 9.1.0.
  • SCRX/Turbo mode enabled
  • L7 or Distributed IDS/IPS inspection enabled

This issue does not apply to Classic/VDPI deployments.

Cause

The issue is triggered by inconsistent packet metadata provided by the guest vNIC driver. Specifically, for an offload packet, the reported header length may be larger than the total packet data length.

Example:

hlen=256
data_len=250
In this condition, the packet metadata is invalid because the header length cannot be larger than the total packet length. When the NSX Service Data Path encounters this invalid metadata during L7 or IDPS inspection, the vmxnet3 transmit queue enters an error state. The queue is stopped to prevent possible memory corruption or host instability. Once the queue is stopped, it no longer processes further traffic for that queue.

Resolution

This issue is resolved in future releases of vDefend/NSX. Upgrade to a fixed release when it becomes available or reach out to Broadcom support


If an immediate upgrade is not possible, restart the NSX-SCX service on the affected ESXi host.

/etc/init.d/nsx-scx-###### restart

Replace nsx-scx-###### with the applicable NSX-SCX service name on the affected host.


Example:

ls /etc/init.d/ | grep nsx-scx
Then restart the matching service:
/etc/init.d/<nsx-scx-BuildNumber> restart

Note: This is a temporary workaround. Restarting the NSX-SCX service resets the vmxnet3 queue state and may restore connectivity, but the queue may stop again if another packet with invalid metadata is encountered.