TKGm cluster in vSphere creation fails using NSX-T v3.1.x and Photon 3 or Ubuntu with Linux Kernel 5.8 VMs
search cancel

TKGm cluster in vSphere creation fails using NSX-T v3.1.x and Photon 3 or Ubuntu with Linux Kernel 5.8 VMs

book

Article ID: 301117

calendar_today

Updated On:

Products

VMware NSX VMware vSphere with Tanzu VMware Container Networking with Antrea

Issue/Introduction

Include only the primary, followed by secondary symptom(s) seen by the customer (log entries, similar errors and so on)

Deploying a management or workload cluster with the following infrastructure and configuration may fail or result in restricted traffic between pods if those pods are on different ESXi hosts:

  • NSX-T versions: vSphere with NSXT v3.1.3 with Enhanced Data Path on, vSphere with NSX-T v3.1.x lower than v3.1.3, NSX-T v3.0.x lower than v3.0.2 hot patch, or NSX-T v2.x.
  • Base images: Photon 3 or Ubuntu with Linux kernel 5.8

This combination exposes a checksum issue between older versions of NSX-T and Antrea CNI.

Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 3.x

Cause

ethtool -K eth0 tx-udp_tnl-segmentation off && ethtool -K eth0 tx-udp_tnl-csum-segmentation off
ethtool -K eth0 tx-udp_tnl-segmentation off && ethtool -K eth0 tx-udp_tnl-csum-segmentation off

Resolution

This issue is resolved in VMware NSX-T Data Center 3.0.2 Hot patch
This issue is resolved in VMware NSX-T Data Center 3.1.3


There are two options to resolve this issue:

  • Upgrade to NSX-T v3.0.2 Hot Patch, v3.1.3 or later. If Enhanced Datapath is enabled, you need to upgrade to NSX v3.2.1.
  • Use an Ubuntu base image with Linux Kernel v5.9 or later.


Workaround:

For TKG 1.5+, you can set `ANTREA_DISABLE_UDP_TUNNEL_OFFLOAD` to `true` when creating the cluster.
For TKG 1.4.2+, not 1.5, you can set `DISABLE_CHECKSUM_OFFLOAD` to `true` when creating the cluster.

In some cases, the management cluster deploys successfully, but there is a traffic drop. To work around this issue, ssh into all control plane and worker VMs and run the following command on all nodes:

ethtool -K eth0 tx-udp_tnl-segmentation off && ethtool -K eth0 tx-udp_tnl-csum-segmentation off