This article describes a condition where the DFW session count and/or vsip-state memory usage on an ESXi transport node reaches a high level. Because each ESXi host has a maximum supported scale of 2 million active connections, exceeding or approaching this limit can lead to connection drops, impact critical workloads, and cause vMotion failures. The purpose of this article is to help identify the condition, understand its impact, and recognize the related threshold-based alerts.
One or more of the following alerts may be observed for the affected ESXi transport node:
Impact:
Any Version of NSX/vDefend with distributed firewall (DFW).
Each ESXi host has a hard limit of 2 million sessions. When this limit is exceeded, any new connections will be dropped. This can affect critical workloads that require initiating outbound traffic, potentially causing what appears to be an outage. Additionally, vMotions to the host will fail if the incoming VM brings connections that cause the total to exceed 2 million.
1. Identify the VM causing the high session count. Follow these steps to diagnose and resolve the issue.
2. Perform packet capture to determine the traffic pattern.
pktcap-uw --capture PreDVFilter,PostDVFilter --dvfilter <Nic information from step 1e> --ng -o <Location for the capture file>.pcap (Note: Do not copy paste this command)
3. Based on the identified traffic pattern one or more of the following steps can be used to remediate the issue