Deploying a management or workload cluster with the following infrastructure and configuration may fail or result in restricted traffic between pods if those pods are on different ESXi hosts:
This combination exposes a checksum issue between older versions of NSX-T and Antrea CNI.
ethtool -K eth0 tx-udp_tnl-segmentation off && ethtool -K eth0 tx-udp_tnl-csum-segmentation off
ethtool -K eth0 tx-udp_tnl-segmentation off && ethtool -K eth0 tx-udp_tnl-csum-segmentation off
This issue is resolved in VMware NSX-T Data Center 3.0.2 Hot patch
This issue is resolved in VMware NSX-T Data Center 3.1.3
There are two options to resolve this issue:
Workaround:
For TKG 1.5+, you can set `ANTREA_DISABLE_UDP_TUNNEL_OFFLOAD
` to `true
` when creating the cluster.
For TKG 1.4.2+, not 1.5, you can set `DISABLE_CHECKSUM_OFFLOAD
` to `true
` when creating the cluster.
In some cases, the management cluster deploys successfully, but there is a traffic drop. To work around this issue, ssh into all control plane and worker VMs and run the following command on all nodes:
ethtool -K eth0 tx-udp_tnl-segmentation off && ethtool -K eth0 tx-udp_tnl-csum-segmentation off