TCP sessions fail over GENEVE overlay due to TCP CHECKSUM INCORRECT errors
search cancel

TCP sessions fail over GENEVE overlay due to TCP CHECKSUM INCORRECT errors

book

Article ID: 385625

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • TCP sessions to specific virtual machines fail to complete when routed over a Generic Network Virtualization Encapsulation (GENEVE) overlay.
  • Performance degradation may be seen when utilizing network offloading.
  • Packet capture may show the client re-sending SYN/ACK packets.
  • Packet capture shows some encapsulated packets have a non-zero padding and/or trailer for some packets.
  • Packet capture contains a [TCP CHECKSUM INCORRECT] error for packets that have non-zero padding and/or trailer (e.g. the above referenced ACK packets)
  • This issue may be seen only when VMs are on different hosts, but not seen when they are on the same host.

Environment

  • VMware vSphere ESXi
  • VMware NSX

Cause

The server VM's Guest OS is padding the Ethernet frame with a non-zero trailer.

When the inner TCP checksum calculation includes this non-zero trailer, the physical NIC's hardware Checksum Offload (CSO) fails to calculate the correct checksum for the GENEVE-encapsulated packet.

The receiving end identifies the bad checksum and drops the packet, causing spurious TCP retransmissions.

Resolution

In VMware Cloud Foundation (VCF) 9.0.2, to avoid checksum errors from non-zero trailers, these packets will be handled using software offloading instead of hardware offloading.

Workaround: To work around this issue on older versions, configure the ESXi host to perform checksum calculations via software instead of relying on hardware offload for the affected physical adapters:

  1. Log in to the affected ESXi host via SSH.

  2. Enable software-based IPv4 checksum offload for the specific vmnic sending GENEVE traffic, need to run the below command from the root shell of esxi host:

    esxcli network nic software set --ipv4cso=1 -n vmnic#
    
    Note: Replace vmnic# with the actual uplink interface
  3. Verify the change by ensuring IPv4 CSO is set to on:

    esxcli network nic software list

Additional Information