This KB article provides a comprehensive guide to diagnose and temporarily resolve the issue related to NSX manager registration in a Tanzu environment with vSphere in NAT mode.
Symptoms:
In a VMware Tanzu environment deployed with vSphere in NAT mode, when deploying more than 20 workload clusters, and with Antrea-NSX integration enabled, some clusters fail to register with the NSX manager. The primary symptoms include:
dmesg
log indicating dropped connections due to rate limits.dmesg: [355518.907296] Dropped per conn limit: IN=eth0 OUT= MAC=mac1 SRC=192.168.0.1 DST=192.168.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=61 ID=49906 DF PROTO=TCP SPT=49611 DPT=1234 WINDOW=64240 RES=0x00 SYN URGP=0 [355519.938940] Dropped per conn limit: IN=eth0 OUT= MAC=mac2 SRC=192.168.0.1 DST=192.168.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=61 ID=49907 DF PROTO=TCP SPT=49611 DPT=1234 WINDOW=64240 RES=0x00 SYN URGP=0
VMware NSX-T Data Center 3.x
Workaround:
Diagnose the issue
1. check the interworking pod and register pod, if there is connection issue, it is maybe the rate limit issue, for example:
# kubectl --kubeconfig ${workload_cluster_config} logs -nvmware-system-antrea interworking-####-####
E0530 10:18:25.559290 13 controller.go:368] Failed to initialize versionhandshake for antrea_monitoring: rpc error: code = INTERNAL desc = , keep trying E0530 10:18:25.559359 13 controller.go:368] Failed to initialize versionhandshake for antrea_traceflow: rpc error: code = INTERNAL desc = , keep trying E0530 10:18:25.559469 13 controller.go:368] Failed to initialize versionhandshake for antrea_traceflow: rpc error: code = INTERNAL desc = , keep trying E0530 10:18:25.559847 13 controller.go:368] Failed to initialize versionhandshake for antrea_traceflow: rpc error: code = INTERNAL desc = , keep trying
2. Check NSX manager- Login to NSX manager, and run dmesg -T, if there is following errors/warnings.
IN=eth0 OUT= MAC=00:50:56:##:##:##:00:50:56:##:##:##:##:## SRC=10.0.0.1 DST=10.0.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=61 ID=41118 DF PROTO=TCP SPT=30439 DPT=1235 WINDOW=64240 RES=0x00 SYN URGP=0 [Tue May dd 10:09:57 20yy] IPTables-Dropped: IN=eth0 OUT= MAC=00:50:56:##:##:##:00:50:56:##:##:##:##:## SRC=10.0.0.1 DST=10.0.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=61 ID=41119 DF PROTO=TCP SPT=30439 DPT=1235 WINDOW=64240 RES=0x00 SYN URGP=0 [Tue May dd 10:09:59 20yy] IPTables-Dropped: IN=eth0 OUT= MAC=00:50:56:##:##:##:00:50:56:##:##:##:##:## SRC=10.0.0.1 DST=10.0.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=61 ID=41120 DF PROTO=TCP SPT=30439 DPT=1235 WINDOW=64240 RES=0x00 SYN URGP=0 [Tue May dd 10:10:04 20yy] IPTables-Dropped: IN=eth0 OUT= MAC=00:50:56:##:##:##:00:50:56:##:##:##:##:## SRC=10.0.0.1 DST=10.0.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=61 ID=41121 DF PROTO=TCP SPT=30439 DPT=1235 WINDOW=64240 RES=0x00 SYN URGP=0
If SRC is gateway IP address or the interworking pod IP address, then we hit the rate limit issue, this is because on NSX manager, there is following iptables rules.
# iptables -L INPUTLOG_DROP2 tcp -- anywhere anywhere multiport dports 1234,rmtcfg state NEW,ESTABLISHED #conn src/32 > 10LOG_DROP2 tcp -- anywhere anywhere multiport dports 1234,rmtcfg state NEW,ESTABLISHED limit: above 10000/sec burst 20 mode srcip LOG_DROP tcp -- anywhere anywhere tcp dpt:1235 state NEW,ESTABLISHED limit: above 10000/sec burst 20 mode srcip # iptables -L LOG_DROP LOG all -- anywhere anywhere limit: avg 10/sec burst 5 LOG level warning prefix "IPTables-Dropped: " DROP all -- anywhere anywhere # iptables -L LOG_DROP2 LOG all -- anywhere anywhere limit: avg 10/sec burst 5 LOG level warning prefix "Dropped per conn limit: " DROP all -- anywhere anywhere
In this scenario, it requests from one IP address exceeds 10/s, iptables will drop the request, it is very easy to exceeds 10/s with a gateway and multiple workload clusters.
To address the issue, we should login to each of the 3 NSX managers and add a new rule in INPUT:
iptables -A INPUT -i eth0 -p tcp -m multiport --dports 1234,1236 -m state --state NEW,ESTABLISHED -m connlimit --connlimit-above 1000 --connlimit-mask 32 --connlimit-saddr -j LOG_DROP2
iptables -L INPUT --line-numbers
... 13 LOG_DROP2 tcp -- anywhere anywhere tcp dpt:1235 state NEW,ESTABLISHED #conn src/32 > 10 14 LOG_DROP tcp -- anywhere anywhere tcp dpt:1235 state NEW,ESTABLISHED #conn src/32 > 10 15 LOG_DROP tcp -- anywhere anywhere tcp dpt:1235 state NEW,ESTABLISHED limit: above 10000/sec burst 20 mode srcip ... 27 LOG_DROP2 tcp -- anywhere anywhere multiport dports 1234,rmtcfg state NEW,ESTABLISHED #conn src/32 > 1000
Delete the rule that defines rate-limit 10, here is 13th and 14th rule, rule 27 will take effect then.
iptables -D INPUT 14
iptables -D INPUT 13
iptables -L INPUT --line-numbers Chain INPUT (policy DROP) num target prot opt source destination 1 ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED 2 ACCEPT all -- anywhere anywhere 3 ACCEPT icmp -- anywhere anywhere icmp echo-request 4 ACCEPT tcp -- anywhere anywhere multiport dports ssh,http,https,9000,9040,7070,7071,9090,65000:65002,65010:65012 tcp flags:FIN,SYN,RST,ACK/SYN 5 ACCEPT udp -- anywhere anywhere multiport dports ntp,snmp,65000:65002,65010:65012 6 ACCEPT tcp -- anywhere anywhere tcp spt:domain state ESTABLISHED 7 ACCEPT udp -- anywhere anywhere udp spt:domain 8 ACCEPT tcp -- anywhere anywhere tcp spt:9092 state ESTABLISHED 9 LOG_DROP2 tcp -- anywhere anywhere multiport dports 1234,rmtcfg state NEW,ESTABLISHED limit: above 10000/sec burst 20 mode srcip 10 ACCEPT tcp -- anywhere anywhere multiport dports 1234,rmtcfg state NEW,ESTABLISHED 11 ACCEPT tcp -- anywhere anywhere tcp spt:https state ESTABLISHED 12 ACCEPT tcp -- anywhere anywhere tcp spt:https state ESTABLISHED 13 ACCEPT all -- anywhere anywhere 14 LOG_DROP tcp -- anywhere anywhere tcp dpt:1235 state NEW,ESTABLISHED limit: above 10000/sec burst 20 mode srcip 15 ACCEPT tcp -- anywhere anywhere multiport dports 1235,7777,ssh state NEW,ESTABLISHED 16 ACCEPT tcp -- anywhere anywhere multiport sports 7777,ssh,https,syslog-tls,shell state ESTABLISHED 17 ACCEPT tcp -- anywhere anywhere tcp spt:9000 state ESTABLISHED 18 ACCEPT udp -- anywhere anywhere udp dpts:11000:11004 19 ACCEPT udp -- anywhere anywhere udp spts:11000:11004 20 ACCEPT udp -- anywhere anywhere udp spt:bootps dpt:bootpc 21 ACCEPT icmp -- anywhere anywhere icmp echo-request 22 ACCEPT icmp -- anywhere anywhere icmp echo-reply 23 REJECT udp -- anywhere anywhere udp dpts:33434:33523 reject-with icmp-port-unreachable 24 ACCEPT icmp -- anywhere anywhere icmp destination-unreachable 25 ACCEPT icmp -- anywhere anywhere icmp time-exceeded 26 LOG_DROP2 tcp -- anywhere anywhere multiport dports 1234,rmtcfg state NEW,ESTABLISHED #conn src/32 > 1000
and change the files that defines the rule in /etc/iptables, you can use `grep LOG_DROP2` to search the files, and change rate limit from 10 to 1000, note the files are:
/etc/iptables/nsx-common.v6rules /etc/iptables/nsx-saved-iptables.v4rules /etc/iptables/nsx-common.v4rules
STEPS TO REPRODUCE THE ISSUE
Deploy Tanzu with vSphere in NAT mode and deploy multiple workload clusters (more than 20) and with Antrea-NSX integration enabled, at the same time, some clusters cannot register to NSX manager.
Impact/Risks:
The primary impact of this issue includes: