High virtual machine ping latency and performance issues with Nexus 1000V

search cancel

High virtual machine ping latency and performance issues with Nexus 1000V

book

Article ID: 326298

calendar_today

Updated On: 11-12-2024

Products

VMware vSphere ESXi

Issue/Introduction

When using the Cisco Nexus 1000V, you see these symptoms:

Ping latency to virtual machines is high.
Virtual machines are not responding or are not responding properly on the network.
As additional virtual machines are added to the Nexus 1000V, the latency and instances of timeouts increase.
The counter for rx_missed_errors increments for the VMNICs associated with the Nexus 1000V and port profiles are being impacted.
When you run the ethtool -S command, you see output similar to this example, and the value continually increments:

rx missed errors : 1367912
When using the same physical NICs on a standard switch, there are no performance or connectivity problems observed.

Environment

VMware ESXi 4.1.x Installable
VMware vSphere ESXi 5.1
VMware vSphere ESXi 5.5
VMware ESXi 4.0.x Installable
VMware vSphere ESXi 5.0
VMware vSphere Cisco Nexus 1000V 4.x

Resolution

This issue can occur due to excessive layer-2 unicast flooding and/or excessive broadcast traffic. Due to the architecture of the Nexus 1000V, it must replicate each incoming broadcast for each virtual machine, resulting in the observed deterioration of latency as the number of hosted virtual machines increases.

To improve virtual machine responsiveness, several measures can be taken. Cisco makes these recommendations:

Reduce the subnet size, or avoid the use of very large subnets. It is more common to see these symptoms with large subnets hosting thousands of Ethernet devices as there will inevitably be a larger volume of broadcast traffic that must be processed.
Isolation of the top broadcasters. If there are a small number of devices generating the majority of the broadcast traffic, it would be advisable to isolate them into their own subnet/VLAN.
Optimize the upstream switch to reduce broadcast traffic. One common source of broadcast traffic is ARP requests. Increasing the MAC address aging timer will reduce the frequency of ARP requests which need to be sent and will reduce the overall amount of broadcast traffic in the subnet.
Implementing UUFB (Unknown Unicast Flood Blocking) can also greatly reduce the load on the Nexus 1000V due to high levels of unicast flooding. Enabling this feature on the Nexus 1000V causes all unknown unicast traffic to be dropped by the VEM module at the uplinks. There are some limitations and precautions that must be taken when implementing UUFB. For more information, see the Blocking Unknown Unicast Flooding section in Cisco's Security Configuration Guide.
Important note if you are using Microsoft Network Load Balancing (NLB): Because Microsoft NLB configures the same MAC address on multiple ports, that MAC address cannot be learned on any of those ports, and therefore leads to unknown unicast flooding. Best practice is to isolate Microsoft NLB virtual machines in a separate broadcast domain or configure UUFB to limit the scope of the flooding.

Feedback

thumb_up Yes

thumb_down No