TKGm cluster in vSphere creation fails using NSX-T v3.1.x and Photon 3 or Ubuntu with Linux Kernel 5.8 VMs
search cancel

TKGm cluster in vSphere creation fails using NSX-T v3.1.x and Photon 3 or Ubuntu with Linux Kernel 5.8 VMs

book

Article ID: 301117

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
Include only the primary, followed by secondary symptom(s) seen by the customer (log entries, similar errors and so on)

Deploying a management or workload cluster with the following infrastructure and configuration may fail or result in restricted traffic between pods if those pods are on different ESXi hosts:

  • NSX-T versions: vSphere with NSXT v3.1.3 with Enhanced Data Path on, vSphere with NSX-T v3.1.x lower than v3.1.3, NSX-T v3.0.x lower than v3.0.2 hot patch, or NSX-T v2.x.
  • Base images: Photon 3 or Ubuntu with Linux kernel 5.8

This combination exposes a checksum issue between older versions of NSX-T and Antrea CNI.



Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 3.x

Resolution

This includes all core contents of the article.

There are two options to resolve this issue:
  • Upgrade to NSX-T v3.0.2 Hot Patch, v3.1.3 or later. If Enhanced Datapath is enabled, you need to upgrade to NSX v3.2.1.
  • Use an Ubuntu base image with Linux Kernel v5.9 or later.


Workaround:
For TKG 1.5+, you can set `ANTREA_DISABLE_UDP_TUNNEL_OFFLOAD` to `true` when creating the cluster.
For TKG 1.4.2+, not 1.5, you can set `DISABLE_CHECKSUM_OFFLOAD` to `true` when creating the cluster.

In some cases, the management cluster deploys successfully, but there is traffic drop. To work around this issue, ssh into all controlplane and worker VMs and run the following command on all nodes:


ethtool -K eth0 tx-udp_tnl-segmentation off && ethtool -K eth0 tx-udp_tnl-csum-segmentation off