DFW drops the TCP flow when the SEQ/ACK number wraps around the max value of 4 GB.
search cancel

DFW drops the TCP flow when the SEQ/ACK number wraps around the max value of 4 GB.

book

Article ID: 322086

calendar_today

Updated On:

Products

VMware vDefend Firewall

Issue/Introduction

Symptoms:

  1. DFW starts dropping TCP flows shortly after using receiving a TCP packet using Seq Number = 0. 
  2. The dropped TCP flow is usually a long-lived connection with a high byte count (as byte count is used in determining seq numbers) or when the TCP flow starts with a high Seq number near to 4 billion
  3. The issue impacts only the following releases

    3.2.3.1
    4.1.1     
    4.1.2
    4.1.2.1

  4. vsipioctl getfilterstat -f <vnic> shows DROP reason counter incrementing for the following parameters when the issue is observed on the application side.

    DROP REASON
    -----------
    state-mismatch:            539             <<<<
    seqno outside window: 302             <<<<
    seqno old ack:              334             <<<<

  5. Packet captures, if taken,  packets will be present at PreDVFilter, but not at PostDVfilter.
  6. The TCP sequence/acknowledgment number ranges from (0 - 4294967295).  When the sequence number rolls over it will not necessarily use seq=0 as sequence number is a calculated value based on byte count.

NOTE:  When viewing packet captures with Wireshark, ensure that Relative Sequence Numbers are unselected in Preference.  Go to Preferences -> Protocols -> TCP. Uncheck the "Relative sequence numbers" checkbox

Environment

VMware NSX

3.2.3.1
4.1.1     
4.1.2
4.1.2.1

 

Cause

  1. The issue is instigated when the TCP sequence number is 0.  
  2. The flow does not stop right away but will fail a short time later.
  3. This causes DFW to incorrectly update values seqlo and seqhi for the TCP flow, post rollover, causing traffic drop.
  4. This issue is DFW-specific and does NOT impact the Gateway Firewall.

Resolution

Only NSX Versions affected are listed below.   All other versions are NOT affected.

3.2.3.1
4.1.1     
4.1.2
4.1.2.1

 

  1. Workaround Option 1
    1. Use a stateless DFW rule for the TCP flow experiencing the issue.
    2. After adding the stateless rule, add the affected vm's to the exclusion list, then immediately remove them.   
    3. This flushes all entries from the state table

  2. Workaround Option 2
    1. Add the VM to the DFW exclusion list.  You can add a single vnic of a vm, if there are multiple.NICs on the VM.

Existing flows will not drop when adding/removing a VM from the exclusion list or when adding the new stateless rule.

 

Additional Information

  1.  
Impact/Risks:
Some TCP flows are dropped