VMs lose connectivity when enabling the security-only feature on Host TN cluster.
search cancel

VMs lose connectivity when enabling the security-only feature on Host TN cluster.

book

Article ID: 317763

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Security-only is enabled on NSX UI.
  • Random VMs in several ESXi clusters lose networking.
  • A mismatch in logical switchport ID is observed:
    1. Identify the switchport of Workload VM that is facing issue:
      > net-stats -l
      100663336 5 9 DvsPortset-1 00:50:##:##:##:e1 MYVM01.eth1
    2. Identify the VIF:
      > net-dvs -l
                      com.vmware.port.extraConfig.vnic.external.id = 1251497834 , propType = CONFIG
                      com.vmware.common.port.volatile.status = inUse linkUp portID=100663336 propType = RUNTIME
    3. In /var/run/log/nsx-opsagent.log, the logical switchport and VIF ID can be found:
      nsx-opsagent[12416680]: NSX 12416680 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="12417026" level="INFO"] [PortOp] Adding [com.vmware.port.extraConfig.vnic.external.id] value [1251497834]
      nsx-opsagent[12416680]: NSX 12416680 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="12417026" level="INFO"] [PortOp] Adding [com.vmware.port.extraConfig.logicalPort.id] value [########-####-####-####-##########37]
    4. In /var/run/log/vmkernel.log, the logical switchport log points to a different UUID:
      2022-09-18T18:19:18.995Z cpu57:12103851)lsp id for switch port 0x06000022 is ########-####-####-####-##########14
      2022-09-18T18:19:18.995Z cpu57:12103851)vif id for switch port 0x06000022 is 1251497834

Environment

VMware NSX-T Data Center
VMware NSX

Cause

This is specific to the security-only use case. Stale data (logical switchport id) was left on the port so, instead of adding new opaque data/extra configuration on the port, it was updated. The Kernel Control Plane (KCP) did not pick up the update and the new logical switchport id did not get updated.

Resolution

This issue is resolved in VMware NSX 3.2.2 available at Broadcom Downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.


Workaround:

Anything that would trigger the workflow to delete and create the affected network port (of the Virtual Machine) would fix this issue.
Example:

  1. Create a temporary distributed portgroup.
  2. Reconfigure the Virtual Machine to this temporary portgroup.
  3. Move the Virtual Machine back to the original portgroup.

 

Additional Information

Impact/Risks:

Due to loss of connectivity, the dataplane is down for the affected Virtual Machine(s).