Workaround for Antrea-NSX Manager Registration Issues in Tanzu with vSphere NAT Deployment

Products

VMware NSX VMware Container Networking with Antrea

Issue/Introduction

This KB article provides a comprehensive guide to diagnose and temporarily resolve the issue related to NSX manager registration in a Tanzu environment with vSphere in NAT mode.

Symptoms:

In a VMware Tanzu environment deployed with vSphere in NAT mode, when deploying more than 20 workload clusters, and with Antrea-NSX integration enabled, some clusters fail to register with the NSX manager. The primary symptoms include:

Controller service on Manager node to Transport node communication is down for at least three minutes.

Event Description:
Controller service on Manager node 192.168.x.y (uuid1) to Transport node (uuid2) down for at least three minutes from Controller service's point of view.

Errors in the dmesg log indicating dropped connections due to rate limits.

dmesg:
[355518.907296] Dropped per conn limit: IN=eth0 OUT= MAC=mac1 SRC=192.168.0.1 DST=192.168.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=61 ID=49906 DF PROTO=TCP SPT=49611 DPT=1234 WINDOW=64240 RES=0x00 SYN URGP=0
[355519.938940] Dropped per conn limit: IN=eth0 OUT= MAC=mac2 SRC=192.168.0.1 DST=192.168.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=61 ID=49907 DF PROTO=TCP SPT=49611 DPT=1234 WINDOW=64240 RES=0x00 SYN URGP=0

Repeated failure messages in the interworking pod logs, particularly for initializing version handshake.
It is a known issue since NSX-T 3.2.0 Version

Environment

VMware NSX-T Data Center 3.x

Cause

The issue is primarily caused by iptables rate limiting on the NSX manager. When multiple workload clusters attempt to register simultaneously, the iptables rules on the NSX manager are configured to limit connection rates, leading to dropped requests and failed registrations.

Resolution

Workaround:
Diagnose the issue

1. check the interworking pod and register pod, if there is connection issue, it is maybe the rate limit issue, for example:

# kubectl --kubeconfig ${workload_cluster_config} logs -nvmware-system-antrea interworking-####-####

E0530 10:18:25.559290      13 controller.go:368] Failed to initialize versionhandshake for antrea_monitoring: rpc error: code = INTERNAL desc = , keep trying
E0530 10:18:25.559359      13 controller.go:368] Failed to initialize versionhandshake for antrea_traceflow: rpc error: code = INTERNAL desc = , keep trying
E0530 10:18:25.559469      13 controller.go:368] Failed to initialize versionhandshake for antrea_traceflow: rpc error: code = INTERNAL desc = , keep trying
E0530 10:18:25.559847      13 controller.go:368] Failed to initialize versionhandshake for antrea_traceflow: rpc error: code = INTERNAL desc = , keep trying

2. Check NSX manager- Login to NSX manager, and run dmesg -T, if there is following errors/warnings.

IN=eth0 OUT= MAC=00:50:56:##:##:##:00:50:56:##:##:##:##:## SRC=10.0.0.1 DST=10.0.0.2  LEN=60 TOS=0x00 PREC=0x00 TTL=61 ID=41118 DF PROTO=TCP SPT=30439 DPT=1235 WINDOW=64240 RES=0x00 SYN URGP=0
[Tue May dd 10:09:57 20yy] IPTables-Dropped: IN=eth0 OUT= MAC=00:50:56:##:##:##:00:50:56:##:##:##:##:## SRC=10.0.0.1 DST=10.0.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=61 ID=41119 DF PROTO=TCP SPT=30439 DPT=1235 WINDOW=64240 RES=0x00 SYN URGP=0
[Tue May dd 10:09:59 20yy] IPTables-Dropped: IN=eth0 OUT= MAC=00:50:56:##:##:##:00:50:56:##:##:##:##:## SRC=10.0.0.1 DST=10.0.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=61 ID=41120 DF PROTO=TCP SPT=30439 DPT=1235 WINDOW=64240 RES=0x00 SYN URGP=0
[Tue May dd 10:10:04 20yy] IPTables-Dropped: IN=eth0 OUT= MAC=00:50:56:##:##:##:00:50:56:##:##:##:##:## SRC=10.0.0.1 DST=10.0.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=61 ID=41121 DF PROTO=TCP SPT=30439 DPT=1235 WINDOW=64240 RES=0x00 SYN URGP=0

If SRC is gateway IP address or the interworking pod IP address, then we hit the rate limit issue, this is because on NSX manager, there is following iptables rules.

# iptables -L INPUTLOG_DROP2 tcp -- anywhere anywhere multiport dports 1234,rmtcfg state NEW,ESTABLISHED #conn src/32 > 10LOG_DROP2 tcp -- anywhere anywhere multiport dports 1234,rmtcfg state NEW,ESTABLISHED limit: above 10000/sec burst 20 mode srcip
LOG_DROP tcp -- anywhere anywhere tcp dpt:1235 state NEW,ESTABLISHED limit: above 10000/sec burst 20 mode srcip
 
# iptables -L LOG_DROP
LOG all -- anywhere anywhere limit: avg 10/sec burst 5 LOG level warning prefix "IPTables-Dropped: "
DROP all -- anywhere anywhere 
# iptables -L LOG_DROP2
LOG all -- anywhere anywhere limit: avg 10/sec burst 5 LOG level warning prefix "Dropped per conn limit: "
DROP all -- anywhere anywhere

In this scenario, it requests from one IP address exceeds 10/s, iptables will drop the request, it is very easy to exceeds 10/s with a gateway and multiple workload clusters.

Workaround

To address the issue, we should login to each of the 3 NSX managers and add a new rule in INPUT:

iptables -A INPUT -i eth0 -p tcp -m multiport --dports 1234,1236 -m state --state NEW,ESTABLISHED -m connlimit --connlimit-above 1000 --connlimit-mask 32 --connlimit-saddr -j LOG_DROP2

iptables -L INPUT --line-numbers

...
13   LOG_DROP2  tcp  --  anywhere             anywhere             tcp dpt:1235 state NEW,ESTABLISHED #conn src/32 > 10
14   LOG_DROP   tcp  --  anywhere             anywhere             tcp dpt:1235 state NEW,ESTABLISHED #conn src/32 > 10
15   LOG_DROP   tcp  --  anywhere             anywhere             tcp dpt:1235 state NEW,ESTABLISHED limit: above 10000/sec burst 20 mode srcip
...
27   LOG_DROP2  tcp  --  anywhere             anywhere             multiport dports 1234,rmtcfg state NEW,ESTABLISHED #conn src/32 > 1000

Delete the rule that defines rate-limit 10, here is 13th and 14th rule, rule 27 will take effect then.

iptables -D INPUT 14
iptables -D INPUT 13

iptables -L INPUT --line-numbers
Chain INPUT (policy DROP)
num  target     prot opt source               destination        
1    ACCEPT     all  --  anywhere             anywhere             state RELATED,ESTABLISHED
2    ACCEPT     all  --  anywhere             anywhere           
3    ACCEPT     icmp --  anywhere             anywhere             icmp echo-request
4    ACCEPT     tcp  --  anywhere             anywhere             multiport dports ssh,http,https,9000,9040,7070,7071,9090,65000:65002,65010:65012 tcp flags:FIN,SYN,RST,ACK/SYN
5    ACCEPT     udp  --  anywhere             anywhere             multiport dports ntp,snmp,65000:65002,65010:65012
6    ACCEPT     tcp  --  anywhere             anywhere             tcp spt:domain state ESTABLISHED
7    ACCEPT     udp  --  anywhere             anywhere             udp spt:domain
8    ACCEPT     tcp  --  anywhere             anywhere             tcp spt:9092 state ESTABLISHED
9    LOG_DROP2  tcp  --  anywhere             anywhere             multiport dports 1234,rmtcfg state NEW,ESTABLISHED limit: above 10000/sec burst 20 mode srcip
10   ACCEPT     tcp  --  anywhere             anywhere             multiport dports 1234,rmtcfg state NEW,ESTABLISHED
11   ACCEPT     tcp  --  anywhere             anywhere             tcp spt:https state ESTABLISHED
12   ACCEPT     tcp  --  anywhere             anywhere             tcp spt:https state ESTABLISHED
13   ACCEPT     all  --  anywhere             anywhere           
14   LOG_DROP   tcp  --  anywhere             anywhere             tcp dpt:1235 state NEW,ESTABLISHED limit: above 10000/sec burst 20 mode srcip
15   ACCEPT     tcp  --  anywhere             anywhere             multiport dports 1235,7777,ssh state NEW,ESTABLISHED
16   ACCEPT     tcp  --  anywhere             anywhere             multiport sports 7777,ssh,https,syslog-tls,shell state ESTABLISHED
17   ACCEPT     tcp  --  anywhere             anywhere             tcp spt:9000 state ESTABLISHED
18   ACCEPT     udp  --  anywhere             anywhere             udp dpts:11000:11004
19   ACCEPT     udp  --  anywhere             anywhere             udp spts:11000:11004
20   ACCEPT     udp  --  anywhere             anywhere             udp spt:bootps dpt:bootpc
21   ACCEPT     icmp --  anywhere             anywhere             icmp echo-request
22   ACCEPT     icmp --  anywhere             anywhere             icmp echo-reply
23   REJECT     udp  --  anywhere             anywhere             udp dpts:33434:33523 reject-with icmp-port-unreachable
24   ACCEPT     icmp --  anywhere             anywhere             icmp destination-unreachable
25   ACCEPT     icmp --  anywhere             anywhere             icmp time-exceeded
26   LOG_DROP2  tcp  --  anywhere             anywhere             multiport dports 1234,rmtcfg state NEW,ESTABLISHED #conn src/32 > 1000

and change the files that defines the rule in /etc/iptables, you can use `grep LOG_DROP2` to search the files, and change rate limit from 10 to 1000, note the files are:

/etc/iptables/nsx-common.v6rules
/etc/iptables/nsx-saved-iptables.v4rules
/etc/iptables/nsx-common.v4rules

Additional Information

STEPS TO REPRODUCE THE ISSUE
Deploy Tanzu with vSphere in NAT mode and deploy multiple workload clusters (more than 20) and with Antrea-NSX integration enabled, at the same time, some clusters cannot register to NSX manager.

Currently setup has 12 Antrea work clusters registered with NSX.
Registration of New Antrea work Cluster is failing as Control Channel to Transport Node Down communication is down.

Impact/Risks:

The primary impact of this issue includes:

NSX-T adapters cannot register to the NSX manager.
Antrea-NSX integration fails to function as expected, hindering network functionality in the Tanzu environment.