Symptoms:
#grep -hE "notification from controller for invalid|notification from controller error" /var/run/log/netcpa.log
2019-12-07T11:43:03.643Z error netcpa[2104030] [Originator@6876 sub=Default] Vxlan: notification from controller error/sub:4/1
2019-12-07T11:43:03.643Z error netcpa[2104030] [Originator@6876 sub=Default] Vxlan: notification from controller for invalid VNI 15212, switchID 0
2019-12-07T11:43:08.645Z error netcpa[2104030] [Originator@6876 sub=Default] Vxlan: notification from controller error/sub:4/1
2019-12-07T11:43:08.645Z error netcpa[2104030] [Originator@6876 sub=Default] Vxlan: notification from controller for invalid VNI 15211, switchID 0
2019-12-07T11:43:08.645Z error netcpa[2104030] [Originator@6876 sub=Default] Vxlan: notification from controller error/sub:4/1
2019-12-07T11:43:08.645Z error netcpa[2104030] [Originator@6876 sub=Default] Vxlan: notification from controller for invalid VNI 15212, switchID 0
#show log syslog filtered-by “Try to join VNI.* not assigned to this node by TransportSwitch”
2018-12-12T11:15:18.226817+00:00 2018-12-12 11: 15:18,226 2513561004 [vxlan worker 3] WARN com.vmware.controller.apps.vxlan.VxlanService - Try to join VNI 5008 not assigned to this node by TransportSwitch [Connection [ip=192.168.1.11:21641, cnnId=46], swId=0]
2018-12-12T11:15:18.227796+00:00 2018-12-12 11: 15:18,227 2513561005 [vxlan worker 2] WARN com.vmware.controller.apps.vxlan.VxlanService - Try to join VNI 5021 not assigned to this node by TransportSwitch [Connection [ip=192.168.1.11:21641, cnnId=46], swId=0]
2018-12-12T11:15:18.755600+00:00 2018-12-12 11: 15:18,755 2513561533 [vxlan worker 2] WARN com.vmware.controller.apps.vxlan.VxlanService - Try to join VNI 5000 not assigned to this node by TransportSwitch [Connection [ip=10.139.211.138:42621, cnnId=50], swId=0]
2018-12-12T11:15:18.756518+00:00 2018-12-12 11: 15:18,756 2513561534 [vxlan worker 1] WARN com.vmware.controller.apps.vxlan.VxlanService - Try to join VNI 5018 not assigned to this node by TransportSwitch [Connection [ip=192.168.1.14.138:42621, cnnId=50], swId=0]
#net-vdl2 -l
Example of output:
(...) output omitted
VXLAN network: 15212
Multicast IP: N/A (headend replication)
Control plane: Enabled ()
Controller: 10.10.10.10 (down) <<--------------
Controller Disconnected Mode: no
Multicast Routing Domain ID: -N/A-
MAC entry count: 26
ARP entry count: 0
Port count: 1
VXLAN network: 15211
Multicast IP: N/A (headend replication)
Control plane: Enabled ()
Controller: 10.10.10.10 (down) <<--------------
Controller Disconnected Mode: no
Multicast Routing Domain ID: -N/A-
MAC entry count: 1
ARP entry count: 0
(...) output omitted
#esxcli network ip connection list | grep “1234 “
Example of output:
tcp 0 0 10.200.19.27:20306 10.10.10.10:1234 ESTABLISHED 2103863 host1 netcpa-worker
tcp 0 0 10.200.19.27:20305 10.10.10.9:1234 ESTABLISHED 2103839 host1 netcpa-worker
tcp 0 0 10.200.19.27:20304 10.10.10.8:1234 ESTABLISHED 2103842 host1 netcpa-worker
This issue is resolved in VMware NSX for vSphere 6.4.5, available at VMware Downloads.
Workaround:
To workaround the issue, restart the impacted NSX Controller. To identify the impacted NSX controller follow the steps below:
1. Verify the NSX controller(s) marked as “down”
# net-vdl2 -l
Example of output:
(...) output omitted
VXLAN network: 15212
Multicast IP: N/A (headend replication)
Control plane: Enabled ()
Controller: 10.10.10.10 (down) <<--------------
Controller Disconnected Mode: no
Multicast Routing Domain ID: -N/A-
MAC entry count: 26
ARP entry count: 0
Port count: 1
VXLAN network: 15211
Multicast IP: N/A (headend replication)
Control plane: Enabled ()
Controller: 10.10.10.10 (down) <<--------------
Controller Disconnected Mode: no
Multicast Routing Domain ID: -N/A-
MAC entry count: 1
ARP entry count: 0
(...) output omitted
2. Confirm the ESXi host have a TCP socket open on port 1234 for the NSX controller(s) marked as “down”
#esxcli network ip connection list | grep “1234 “
Example of output:
tcp 0 0 10.200.19.27:20306 10.10.10.10:1234 ESTABLISHED 2103863 host1 netcpa-worker
tcp 0 0 10.200.19.27:20305 10.10.10.9:1234 ESTABLISHED 2103839 host1 netcpa-worker
tcp 0 0 10.200.19.27:20304 10.10.10.8:1234 ESTABLISHED 2103842 host1 netcpa-worker
3. Review the NSX Controller logs and confirm you can see the error “Try to join VNI XXX not assigned to this node by TransportSwitch”
#show log syslog filtered-by “Try to join VNI.* not assigned to this node by TransportSwitch”
Example of output:
2018-12-12T11:15:18.226817+00:00 2018-12-12 11: 15:18,226 2513561004 [vxlan worker 3] WARN com.vmware.controller.apps.vxlan.VxlanService - Try to join VNI 5008 not assigned to this node by TransportSwitch [Connection [ip=192.168.1.11:21641, cnnId=46], swId=0]
2018-12-12T11:15:18.227796+00:00 2018-12-12 11: 15:18,227 2513561005 [vxlan worker 2] WARN com.vmware.controller.apps.vxlan.VxlanService - Try to join VNI 5021 not assigned to this node by TransportSwitch [Connection [ip=192.168.1.11:21641, cnnId=46], swId=0]