VMs lose connectivity when Service Insertion is in use and Overlay Transport Zone is removed from host switch
search cancel

VMs lose connectivity when Service Insertion is in use and Overlay Transport Zone is removed from host switch

book

Article ID: 319071

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • When Service Insertion is in use and an Overlay Transport Zone is removed from a host switch, VMs may lose connectivity after being vMotioned.
  • On the ESXi host, vmkernel logs show errors about the SPF port connection (/var/run/log/vmkernel.log):

2022-10-15T13:43:57.147Z cpu15:31704126)WARNING: spf: SPFPort_Connect:145: [nsx@6876 comp="nsx-esx"]Could not connect SPF port : Not found
2022-10-15T13:43:57.147Z cpu15:31704126)WARNING: spf: SPFPortPropSet:114: [nsx@6876 comp="nsx-esx"]Failed to get to connect spfPort for ps DvsPortset-1 : Not found

  • On the ESXi host, vmkernel logs also show failures to enable VM ports after the property com.vmware.port.extraConfig.serviceInsertion.gvm is cleared (/var/run/log/vmkernel.log):

2022-10-15T13:43:57.147Z cpu15:31704126)SPFPortPropClear:188:[nsx@6876 comp="nsx-esx"]Cleared port 0x<#######> property com.vmware.nsx.spf.gvm
2022-10-15T13:43:57.147Z cpu15:31704126)Vmxnet3: 21296: Port_Enable failed for port 0x<#######>

 

Since Service Insertion requires an Overlay Transport Zone, this issue will be encountered when:

  1. Service Insertion is in use.
  2. Transport Nodes are configured with two or more host switches.
  3. An Overlay Transport Zone was removed from one host switch (which hits the issue), and there is an Overlay Transport Zone on another host switch.
  • The command net-dvs -l shows that the property com.vmware.nsx.spf.enabled is set to true for the host switch even though the Overlay Transport Zone was removed.
  • To list switch names and spf.enabled property, the following command may be used: net-dvs -l | grep -E "common.alias|spf.enabled"

 

Full log excerpt from vmkernel logs (/var/run/log/vmkernel.log):

2022-10-15T13:43:57.147Z cpu15:31704126)NetX Props: setting property com.vmware.port.extraConfig.serviceInsertion.gvm for port 0x<#######>
2022-10-15T13:43:57.147Z cpu15:31704126)SPFPortPropSet:54:[nsx@6876 comp="nsx-esx"]Set port 0x<#######> property com.vmware.nsx.spf.gvm
2022-10-15T13:43:57.147Z cpu15:31704126)SPFPortPropClear:188:[nsx@6876 comp="nsx-esx"]Cleared port 0x<#######> property com.vmware.nsx.spf.gvm        <---- Setting the com.vmware.nsx.spf.gvm property fails here
2022-10-15T13:43:57.147Z cpu15:31704126)NetX Props: clearing property com.vmware.port.extraConfig.serviceInsertion.gvm for port 0x<#######>
2022-10-15T13:43:57.147Z cpu15:31704126)netxnetx Netx IOChain removed on port 0x<#######>
2022-10-15T13:43:57.147Z cpu15:31704126)NetX Props: clearing property com.vmware.nsx.spf.gvm for port 0x<#######>
2022-10-15T13:43:57.147Z cpu15:31704126)WARNING: NetPort: 1371: failed to enable port 0x<#######>: Not found
2022-10-15T13:43:57.147Z cpu15:31704126)netschedHClk: NetSchedHClkPortQuiesce:5064: vmnic6: received a force quiesce for port 0x<#######>
2022-10-15T13:43:57.147Z cpu15:31704126)NetPort: 1580: disabled port 0x<#######>
2022-10-15T13:43:57.147Z cpu15:31704126)Vmxnet3: 21296: Port_Enable failed for port 0x<#######>
2022-10-15T13:48:25.549Z cpu10:31704093)Net: 3707: dissociate dvPort <UUID> from port 0x<#######>
2022-10-15T13:48:25.549Z cpu10:31704093)Net: 3713: disconnected client from port 0x<#######>
[...]
2022-10-15T13:43:57.147Z cpu15:31704126)WARNING: spf: SPFPort_Connect:145: [nsx@6876 comp="nsx-esx"]Could not connect SPF port : Not found
2022-10-15T13:43:57.147Z cpu15:31704126)WARNING: spf: SPFPortPropSet:114: [nsx@6876 comp="nsx-esx"]Failed to get to connect spfPort for ps DvsPortset-1 : Not found

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Cause

The spf property is incorrectly still enabled after an Overlay Transport Zone is removed from a host switch. 
Since the spf property is enabled even though there is no Overlay Transport Zone, 'com.vmware.nsx.spf.gvm' property fails to be set on the vNIC port resulting in the vNIC going into an error state and VM connectivity loss.

Resolution

This issue is resolved in VMware NSX 3.2.3, 4.0.2, and 4.1.0.

Workaround:
Command to unset the spf property from the host:
net-dvs -u 'com.vmware.nsx.spf.enabled' -p hostPropList <host switch name> 
Whenever a Management plane to nsxa sync happens, the property will be reset again. Events like nsxa service restarts, upgrades, or Transport Node updates will cause a sync.
 
Another option to correct this behavior is to uninstall and reinstall NSX from affected hosts one at a time. As long as Overlay Transport Zones are not removed from host switches after re-install, this issue is not hit and Management plane to nsxa syncs will not re-enable the spf property.