NSX ports are missing on VC while the VMs can communicate.
search cancel

NSX ports are missing on VC while the VMs can communicate.

book

Article ID: 401849

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • You observe that NSX ports of VMs on one or more specific ESXi hosts are not listed in [Ports] tab of NSX portgroups within the vSphere Client.
  • The VMs still have network connectivity.
  • DFW lets all packets go through regardless of configured rules.
  • The VMs can not migrate to a different ESXi host if the vCenter is under 7.0 U3q.

  • The host is/was under memory pressure. You can confirm it in the vmkernel logs similar to:
    <Timestamp> cpu22:2097498)Admission failure in path: host:system:kernel:BufferCache
    <Timestamp> cpu22:2097498)BufferCache (80) extraMin/extraFromParent: 1/1, host (0) childEmin/eMinLimit: 66869749/66869749
    <Timestamp> cpu11:2126505)WARNING: Heap: 3894: Could not allocate 28672 bytes for dynamic heap cartelheap.2097646. Request returned Admission check failed for memory resource
    <Timestamp> cpu9:2100876 opID=59afda51)WARNING: World: 2706: Could not allocate new world handle for world ID: 2217645: Admission check failed for memory resource
    <Timestamp> cpu21:2217650)WARNING: UserParam: 1396: sh: could not change group to <host/vim/vmvisor/esximage>: Admission check failed for memory resource
    <Timestamp> cpu30:2217931)WARNING: LinuxThread: 424: sh: Error cloning thread: -28 (bad0081)

  • You see that the ports exist on the ESXi Host with the net-dvs -l command, but the two extraConfig properties are not set on the NSX ports:
    com.vmware.port.extraConfig.vnic.external.id
    com.vmware.port.extraConfig.opaqueNetwork.id
    Instead com.vmware.port.extraConfig.security.enable = false is set even for linkup ports.

  • You see GetNsxEnabledCvdsIds did not return the correct NSX DVS's and the validation was invoked.
    nsxaVim.log
    <Timestamp> nsxaVim: [2126107]: INFO Reading data from socket.
    <Timestamp> nsxaVim: [2126107]: INFO data size= [512]. actual data length = [512]
    <Timestamp> nsxaVim: [2126107]: INFO HandleRPC: fn = 19 
    <Timestamp> nsxaVim: [2126107]: INFO [GetNsxEnabledCvdsIds] cvdsId: [[]]
    <Timestamp> nsxaVim: [2126107]: INFO Performing opaque data validation on dvs [<DVS UUID>]
    <Timestamp> nsxaVim: [2126107]: ERROR Port <Port UUID> on VDS <DVS UUID> has pre-existing VIF ID <VIF ID>
    <Timestamp> nsxaVim: [2126107]: ERROR Port <Port UUID> on VDS <DVS UUID> has pre-existing VIF ID <VIF ID>
    ...
    <Timestamp> nsxaVim: [2126107]: INFO Updating NN ports on dvs [<DVS UUID>]
  • You see an operation by the nsx-user failed for the out of memory error.
    hostd.log
    <Timestamp> verbose hostd[#########] [Originator@6876 sub=AdapterServer opID=<opID> user=nsx-user] New request: target='vim.dvs.HostDistributedVirtualSwitchManager:ha-hostdvsmanager', method='retrieveNsxDvsConfig', session='<UUID>'
    <Timestamp> verbose hostd[#########] [Originator@6876 sub=Vimsvc.TaskManager opID=<opID> user=nsx-user] Task Created : haTask--vim.dvs.HostDistributedVirtualSwitchManager.retrieveNsxDvsConfig-#########
    <Timestamp> verbose hostd[#########] [Originator@6876 sub=PropertyProvider opID=<opID> user=nsx-user] RecordOp ASSIGN: info, haTask--vim.dvs.HostDistributedVirtualSwitchManager.retrieveNsxDvsConfig-#########. Applied change to temp map.
    <Timestamp> verbose hostd[#########] [Originator@6876 sub=PropertyProvider opID=<opID> user=nsx-user] [RecordAndNotifyChangeInt] No listeners on haTask--vim.dvs.HostDistributedVirtualSwitchManager.retrieveNsxDvsConfig-#########- bailing out
    <Timestamp> warning hostd[#########] [Originator@6876 sub=Hostsvc.NetworkProvider opID=<opID> user=nsx-user] Error getting NSX-specific config(key-value data) for dvs <DVS UUID> : N7HostCtl3Lib16HostCtlExceptionE(Unable to Get DVS vendor specific data: Status(bad0014)= Out of memory)
    <Timestamp> info hostd[#########] [Originator@6876 sub=AdapterServer opID=<opID> user=nsx-user] AdapterServer caught exception; <<<UUID>, <TCP '127.0.0.1 : 8307'>, <TCP '127.0.0.1 : 18250'>>, ha-hostdvsmanager, vim.dvs.HostDistributedVirtualSwitchManager.retrieveNsxDvsConfig>, N7Hostsvc21HaPlatformConfigFault9ExceptionE(Fault cause: vim.fault.PlatformConfigFault
    ...
    <Timestamp> verbose hostd[#########] [Originator@6876 sub=Vimsvc.TaskManager opID=<opID> user=nsx-user] Task Completed : haTask--vim.dvs.HostDistributedVirtualSwitchManager.retrieveNsxDvsConfig-#########Status error
    <Timestamp> info hostd[#########] [Originator@6876 sub=Solo.Vmomi opID=<opID> user=nsx-user] Activation finished; <<<UUID>, <TCP '127.0.0.1 : 8307'>, <TCP '127.0.0.1 : 18250'>>, ha-hostdvsmanager, vim.dvs.HostDistributedVirtualSwitchManager.retrieveNsxDvsConfig>
    <Timestamp> verbose hostd[#########] [Originator@6876 sub=Solo.Vmomi opID=<opID> user=nsx-user] Arg dvsUuid:
    --> "<DVS UUID>"
    <Timestamp> info hostd[#########] [Originator@6876 sub=Solo.Vmomi opID=<opID> user=nsx-user] Throw vim.fault.PlatformConfigFault
    <Timestamp> info hostd[#########] [Originator@6876 sub=Solo.Vmomi opID=<opID> user=nsx-user] Result:
    --> (vim.fault.PlatformConfigFault) {
    -->    faultMessage = (vmodl.LocalizableMessage) [
    -->       (vmodl.LocalizableMessage) {
    -->          key = "com.vmware.esx.hostctl.default",
    -->          arg = (vmodl.KeyAnyValue) [
    -->             (vmodl.KeyAnyValue) {
    -->                key = "reason",
    -->                value = "Unable to Get DVS vendor specific data: Status(bad0014)= Out of memory"
    -->             }
    -->          ],
    -->       }
    -->    ],
    -->    text = "",
    -->    msg = ""
    --> }

Environment

VMware NSX 4.2.1 and later

Cause

Under memory pressure, NSX could fail to get NSX-enabled DVS's and wrongly assume that NSX is not enabled on any DVS's.
It could clear the two extraConfig properties of NSX ports and set com.vmware.port.extraConfig.security.enable = false.

Such ports are not recognised by the vCenter. DFW loses all the rules and lets all the packets go through.

Resolution

The issue is fixed in NSX 4.2.3.1 and 9.0.1.

If you find the issue, first shut down some VMs or move some VMs to a different ESXi and free up some memory.
Post that, connect all the vNICs to a different portgroup and then, connect them back to the original NSX portgroup.