After adding new leaf switches to expand VMware infrastructure with NSX, virtual machines running on ESXi hosts connected to the new leaf switches experience severe performance degradation. Symptoms include:
Steps to verify the issue:
The performance degradation is likely caused by an underlying issue along the network path, possibly including duplex mismatches, ethernet configuration problems, or VPC issues within the switching infrastructure. When Edge nodes remain on ESXi hosts connected to original leaf switches while compute workloads run on hosts connected to new leaf switches, the asymmetric traffic paths and increased hop count make these issues more apparent.
Traffic path for VMs on original hosts (normal performance):
Traffic path for VMs on new hosts (degraded performance):
The additional traversals through the spine switches and nearly double the hop count amplify any existing network configuration issues. The problem is specifically related to the physical network infrastructure between new and existing leaf switches, not NSX functionality.
To isolate and resolve the issue:
1. Confirm NSX is functioning correctly:
2. Address Edge node placement:
3. Review MTU configuration:
4. Address any Edge node resource constraints:
5. Work with network team to optimize leaf-spine configuration: