Excessively large flow table on NSX Edge leading to connection drops

Products

VMware NSX Data Center for vSphere

Issue/Introduction

Symptoms:

Listing the concurrent connections on the NSX ESG shows an excessively large number: show flowstats

Total Flow Capacity: 1000000
Current Flow Entries: 543120
[...]

In the above example, there are 543,120 known flows on the NSX ESG, the capacity being at 1,000,000 flows. The figure should be compared to the expected traffic scale in the relevant environment.

Connections through the NSX ESG experience drops.
NSX Edge High Availability is enabled: show service highavailability
The NSX ESG is in version 6.4.0 or above.
A large amount of flows with packet counters at zero is recorded: show flowtable

123: tcp      6 2553 ESTABLISHED src=10.0.0.1 dst=10.0.0.2 sport=10001 dport=10002 pkts=0 bytes=0 src=10.0.0.2 dst=10.0.0.1 sport=10002 dport=10001 pkts=0 bytes=0 [ASSURED] mark=0 rid=0 use=1

In the above example, the TCP flow opened between 10.0.0.1:10001 and 10.0.0.2:10002 is known as ESTABLISHED, will time out in 2,553 seconds, but no traffic has been recorded.

A large amount of flows is marked with a TCP timeout value greater than the configured (maximum) TCP timeout.

To check the configured TCP timeout: show flowtimeouts

nf_conntrack_tcp_timeout_syn_sent = 30
nf_conntrack_tcp_timeout_syn_recv = 30
nf_conntrack_tcp_timeout_established = 21600
nf_conntrack_tcp_timeout_fin_wait = 20
nf_conntrack_tcp_timeout_close_wait = 60
nf_conntrack_tcp_timeout_last_ack = 30
nf_conntrack_tcp_timeout_time_wait = 30
nf_conntrack_tcp_timeout_close = 10
[...]

In the above example, the TCP timeout for Established flows is configured at 21,600 seconds.
To list the known flows: show flowtable

234: tcp      6 4253932 ESTABLISHED src=10.0.0.1 dst=10.0.0.3 sport=10011 dport=10013 pkts=0 bytes=0 src=10.0.0.3 dst=10.0.0.1 sport=10013 dport=10011 pkts=0 bytes=0 [ASSURED] mark=0 rid=0 use=1

In the above example, the TCP flow has a timeout of 4,253,932 seconds, so far greater than the configured TCP timeout.

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX Data Center for vSphere 6.4.x

Cause

As part of the NSX Edge High Availability feature set, the synchronization of the flow table from the active appliance to the standby appliance avoids the breaking of flows in case of HA failover. Indeed, when the formerly standby appliance becomes the newly active appliance, the flows are already known, so the stateful firewall is able to match the traffic to the known flows.
The flow table synchronization leverages conntrackd. The intended behavior is for the flow table to be pushed from the active appliance to the standby appliance. The issue is introduced by an unintended bidirectional synchronization, overwriting flow status on the active appliance from the standby appliance.

In association to this synchronization issue, flows may get assigned a TCP timeout value that is greater than the configured value (TCP timeout is a decrementing value). This contributes to the growth of the flow table since flows may not time out in timely manner.

Empty flows, seen in the table with packet counter at 0, are expected in the following situations:

A Load Balancer service monitor configured on the NSX ESG may simply establish the flow to confirm the availability of the backend server, resulting in an intendedly empty flow.
When a flow is synchronized from the active appliance to the standby appliance, its traffic counter on the standby appliance is at 0 since the standby appliance does not participate in the datapath. If the flow is no longer used and a HA failover occurs, the empty flow may be seen as established on the newly active appliance, until it reaches its timeout.

Resolution

This issue is resolved in VMware NSX Data Center for vSphere 6.4.8

Workaround:
Disabling either of the below features, or both, removes the conntrackd synchronization:

NSX Edge High Availability: This removes the need for a standby appliance, but reduces the level of redundancy of the NSX ESG.
NSX Edge Firewall: This removes the need to synchronize the flow table, but it prevents any traffic filtering to be done on the NSX ESG.

If neither of these options are possible, and to work around this issue, contact Broadcom Support and note this Article ID in the problem description.

Additional Information

Impact/Risks:
This condition may result in fast growing flow table on the NSX ESG and eventually connection drops, essentially affecting workload performance in the environment.