vMotion, iSCSI or TEP services are down after a port scan against ESXi vmk interfaces.
search cancel

vMotion, iSCSI or TEP services are down after a port scan against ESXi vmk interfaces.

book

Article ID: 312156

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

A few users have experienced this issue and the existing article BFD tunnels down after a port scan against ESXi TEP interfaces addresses only one of the scenarios.

Symptoms:

  • NSX-T Data Center.
  • ESXi hosts have multiple vmk interfaces on same subnet and Netstack.
  • A network port scan has been run from another network against the ESXi vmk interfaces.
  • The host is part of Cisco ACI deployment.
  • Traffic on vmk interfaces got affected and down briefly or permanently. Sometimes the situation gets resolved after around 15 mins without any user intervention.
  • Asymmetric traffic is observed on the vmk interfaces on the same Netstack. For example, if vmk1 and vmk2 are on the same subnet and share the same Netstack, and if the default route is on vmk1, vmk2 is observed to have responded to the port scan using its own IP but from vmk1 interface and using vmk1 MAC address.
  • Any vmk interface-based services can be affected, including vMotion, VTEP or iSCSI. (Note : vMotion and iSCSI in asymmetric traffic configuration is not recommended except in cases of Multiple-Nic vMotion and iSCSI Port-Binding. Please refer to the note at the end of this document for more details).
  • BFD/Geneve traffic is still being sent correctly from their respective interfaces using same IP and mac address.



Environment

VMware vSphere ESXi 8.0.1
VMware vSphere ESXi 8.0.x
VMware vSphere ESXi 8.0.0
VMware vSphere ESXi 8.0.2

Cause

When a vmk interface is port scanned it will receive a SYN (for an unopened port) or a rogue ACK packet.
The ESXi host may need to respond with an RST packet. When the source of that scan is on another network, these responses will need to be routed and the network stack routing table is consulted to determine the egress interface.
 
#esxcli network  ip route ipv4      list -N vxlan
Network              Netmask           Gateway               Interface   Source
-------------           -------------          ---------------             ---------     ------
default                xxx.xxx.xxx.xxx   xxx.xxx.xxx.xxx     vmk1     MANUAL                  
xxx.xxx.xxx.xxx  xxx.xxx.xxx.xxx   xxx.xxx.xxx.xxx     vmk1     MANUAL

In this case the default gateway is on vmk1.
Therefore, when vmk2 replies to a TCP port scan packet it will do so through the vmk1 interface. The result is a packet with vmk2 IP address and vmk1 MAC address sent on vmk1 interface. This behavior does not impact on data path for encapsulated Geneve traffic. However, if the physical network fabric uses these packets to update its ARP table instead of ARP snooping alone then it can poison the ARP table. This can then indirectly result in overlay data path traffic disruption.

Resolution

The issues involving both SYN and ACK scanning are resolved in VMware ESXi 8.0 Update 2.

Workaround:
To prevent this issue from occurring ensure that network port scans are not run against the ESXi host vmk interface subnets where these services are configured.

Note:
Since the behavior is ESXi related, NSX for vSphere environments can also experience network disruption after a port scan when multiple VTEPs are configured on ESXi hosts.

  • Multihoming in ESX is not a supported configuration except in a very few cases like Multiple-Nic vMotion, VTEP and iSCSI Port-Binding and this document should be followed only in such contexts. The customers who are using multihoming configuration in other scenarios should be requested to change the configuration and remove multihoming from vmkernel interfaces.
  • More details about multihoming in ESX : Multihoming on ESXi
  • Multiple-Nic vMotion : Multiple-NIC vMotion in vSphere
  • iSCSI Port-Binding : Considerations for using software iSCSI port binding in ESXi