This issue entails reported intermittent latency, application timeouts, and overall degraded performance affecting traffic flows between clients and a subset of backend proxy servers or pool members. The observed issue occurs along a datapath involving:
NSX-T overlay networking
ESXi hypervisors
An F5 BIG-IP virtual load balancer
Web proxies behind the F5 load balancer
Packet captures taken throughout the datapath reveal frequent retransmissions, duplicate ACKs, and missing return-path packets. Analysis may confirm that the F5 load balancer VM drops packets internally, causing retransmissions and end-to-end latency.
The below screenshot is evidence of this issue. Packets on the left hand side are entering the downlink switchport of this particular F5 load balancer. The right hand side capture are packets that are egressing the F5 uplink. In this specific TCP stream the packet ID highlighted in yellow is the last packet seen in both captures before many retransmissions. The packets marked with a red line are never seen at the uplink indicating that the F5 is, in fact, dropping traffic.
VMware NSX
VMware vSphere ESXI
The drops can be traced to a performance bottleneck on the F5 VM’s uplink NetWorld TX thread, which exhibit 100% utilization, indicating that the guest OS is unable to hand packets to the virtual NIC quickly enough.
The root cause is the result of packet loss inside the F5 VE due to NetWorld TX thread saturation which can be observed on the ESXi host.
Packet captures show missing packets inside the F5 VM
Downlink switchport capture (ingress to F5) received all returning packets.
Uplink switchport capture (egress from F5 to upstream network) showed missing packets.
Meaning: packets entered the F5 but never left, confirming internal drops.
Retransmissions and duplicate ACKs
Multiple layers of the datapath show classic TCP loss patterns.
These are a symptom, but not the root cause.
ESXi net-stats show TX thread saturation
The command used to pull the relevant CPU usage for vCPUs, Pollworlds, and Networlds: net-stats -A -t WwQqihVvh -i 10 > /tmp/netstats.txt
The TX NetWorld thread for the F5 VE uplink NIC may be at or around 100%.
A vNIC with only one TX context (ctxPerDev=1) cannot scale with higher packet rates.
Insufficient parallelization inside the F5 VE vNIC
With ctxPerDev=1, the F5 VE uses a single TX queue/thread.
Under moderate-to-heavy traffic, this results in:
vNIC bottleneck
Dropped packets
TCP retransmissions
Latency spikes
F5 documentation confirms multi-queue is required for high PPS workloads
F5 VE tuning guides recommend increasing ctxPerDev to 2 or 3 for better scaling on ESXi.
Once ctxPerDev was increased to 3, additional TX threads became available, distributing the flow load across multiple queues.
Increasing the TX queue depth and enabling multiple TX contexts resolved the packet loss entirely.
Power off the F5 VE (required to alter VMX network parameters).
Navigate to the VM’s .vmx configuration or equivalent vSphere advanced vNIC settings.
Update the network tuning parameter:
From:
To:
Verify ethernetX.pnicFeatures="4" and ethernetX.udpRSS="1" were in place (these settings improve RSS and multiqueue behavior for F5 VEs).
Power the F5 VE back on.
Re-test using packet captures, application testing, and ESXi net-stats.
TX NetWorld utilization drops sharply as load spreads across the additional TX threads.
Packet loss ceases on the F5 uplink.
Retransmissions and duplicate ACKs disappear.
Application latency and timeouts fully resolve.
The issue remains permanently resolved under typical production traffic load.
F5 optimization guide for VMware
Optimize hypervisor settings
VMware KB: NetWorld and vNIC queue behavior
NSX workload & edge throughput performance tuning
VMware KB: Increasing ctxPerDev for better vNIC parallelization
High virtual network throughput performance tuning recommendation when using 100G network interface cards