vSAN performance diagnostics reports:"There are losses seen in the TCP layer at one or more hosts"

search cancel

vSAN performance diagnostics reports:"There are losses seen in the TCP layer at one or more hosts"

book

Article ID: 326609

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

In vSAN performance diagnostics, you see a message similar to:

There are losses seen in the TCP layer at one or more hosts

This issue is caused when vSAN uses the TCP protocol for inter-host communication. These errors occur in the TCP stack of the vSAN host, and are caused by various TCP issues such as out-of-order packets (OOO), duplicate ACKS (Dup ACKs), and TCP retransmissions. Beneath the error message, you can see a list of hosts on which the errors occurred.

Environment

VMware vSAN 6.5.x

Resolution

What is the impact of this issue?

There is no functional impact on vSAN; it uses TCP, which recovers from packet losses by retransmitting lost segments. However, this process causes application latency—specifically, it is likely that the average and standard deviation of latency that your application sees is higher due to dropped packets. To get the best performance from your vSAN cluster, you should fix the issue.

What are possible ways to address this issue?

Such errors occur in the TCP stack when there are packet losses somewhere in the end-to-end TCP connection between two hosts. As an example, if a packet is dropped somewhere in the network between two hosts (such as a switch), the receiving host receives out-of-order packets. TCP at the receiving host responds to this by issuing a Dup ACK, and the sender responds to the Dup ACK by retransmitting the lost packet. If the packet is dropped on the end host, such as at the physical network interface card (NIC) or virtual network adapter (vmknic), then a different exception (such as the one documented in KB 2150181) is reported. This issue mostly pertains to cases where there are packet drops in the core network to which the hosts connect. There are a few reasons that packet drops could happen. Please investigate the following.

Please refer to vSAN Network design that discusses best practices for the vSAN network.
In situations where vSAN is sharing the physical adapter with other VMs, it is possible that the instantaneous network traffic from all sources exceeds the network line rate, in which case packets would be dropped at some point in the network. Please use a dedicated 10 Gbps physical NIC for vSAN traffic. In case you really need to share a physical NIC between vSAN and other traffic, please use NetIOC to reserve network bandwidth for vSAN traffic. Please refer to the above whitepaper link for details on setting up NetIOC.
Employing large switch buffers may reduce the possibility of packet drops at the switch ingress or egress line port. If possible, please explore with your switch vendor if the size of the switch buffer can be increased. Packets may also be dropped in the core network as they are transported through a hierarchy of switches (such as leaf/spine/TOR switches). Please explore with your switch vendor if the core network is configured correctly.
In case you have deployed VXLAN with NSX, please see the following KB 52530, To check if you are hitting the maximum supported MAC addresses of 1024 per VNI/Host.

Feedback

thumb_up Yes

thumb_down No