VM lose network connectivity and filter state changes to IOChain Detaching
search cancel

VM lose network connectivity and filter state changes to IOChain Detaching

book

Article ID: 316672

calendar_today

Updated On:

Products

VMware NSX VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • Virtual machines might lose network connectivity
  • We see slot 4 filter going in IOChain Detaching state in the output of the summarize-dvfilter command for problematic VM
Note: It can affect any slot which is using a slowpath agent

[root@test:~] summarize-dvfilter
 port 50331659 VMNAME.eth0
  vNic slot 2
   name: nic-2104945-eth0-vmware-sfw.2
   agentName: vmware-sfw
   state: IOChain Attached
   vmState: Attached
   failurePolicy: failClosed
   serviceVMID: 2
   filter source: Dynamic Filter Creation
  vNic slot 4
   name: nic-2104945-eth0-serviceinstance-6.4
   agentName: serviceinstance-6
   state: IOChain Detaching>>>>>>>>>filter is in detaching state
   vmState: Attached
   failurePolicy: failClosed
   serviceVMID: 5

In the vmkernel logs of the ESXi host you will see below logs:
  

2020-09-14T04:18:34.269Z cpu23:2102063)Found the map entry for key nic-2104945-eth0-serviceinstance-6.4
2020-09-14T04:18:34.269Z cpu23:2102063)Map already exists for given key (nic-2104945-eth0-serviceinstance-6.4)
2020-09-14T04:18:34.288Z cpu23:2102063)Profile id = serviceprofile-7
2020-09-14T04:18:34.288Z cpu23:2102063)Instance id = serviceinstance-6
2020-09-14T04:18:34.288Z cpu23:2102063)Filter Name = nic-2104945-eth1-serviceinstance-6.4
2020-09-14T04:18:34.288Z cpu23:2102063)Found the map entry for key nic-2104945-eth1-serviceinstance-6.4
2020-09-14T04:18:34.288Z cpu23:2102063)Map already exists for given key (nic-2104945-eth1-serviceinstance-6.4)
2020-09-14T04:18:34.312Z cpu23:2102063)DVFilter: 5667: Filter 'nic-2104945-eth0-serviceinstance-6.4' on port 300000b still has refcount: 2, aborting>>>>>>>>>>>

Environment

VMware NSX Data Center for vSphere 6.4.x

Cause

When the guest-traffic is sent to the slowpath agent, the latter injects the traffic back into the filter after inspection. At this point a refcount is added to the filter while it is handling these in-flight packets. Now if there is an attempt to reconfigure or revalidate the VM interface, it involves refreshing all the filters. Any filters which have a refcount on them change to the detaching state and remain in the detaching state until the filter is removed. When the filter is in the detaching state, it no longer accepts any packets from the slowpath leading to dropped traffic.


Trigger:
Any DFW filter reconfiguration activity on the ESXi host, such as adding or removing filters, might cause some filters to start dropping packets.

Resolution

Issue has been resolved in ESXI 6.7 p04(ESXi670-202011002)
https://docs.vmware.com/en/VMware-vSphere/6.7/rn/esxi670-202011002.html

Workaround:
There are no preventive workarounds but vMotioning the VM, resetting the vmnic, changing the port group or rebooting the virtual machine should restore traffic

Additional Information

Impact/Risks:
Virtual machines might lose network connectivity