Intermittent NFS Connectivity Failures and Filesystem Freezes on Linux VMs in NSX-T Overlay Networks
search cancel

Intermittent NFS Connectivity Failures and Filesystem Freezes on Linux VMs in NSX-T Overlay Networks

book

Article ID: 436402

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms

  • Multiple Linux virtual machines (VMs) residing on VMware NSX-T overlay segments experience intermittent connectivity issues with physical NAS/NFS storage.
  • Running `ls` or other I/O operations on the NFS mount point hangs indefinitely.
  • TCP connections to the NFS server succeed most of the time, but fail randomly.
  • Packet captures (`pktcap-uw`) on switchport confirms that the traffic successfully exits the VM's virtual NIC (vNIC)

    #   pktcap-uw --switchport  ########  --dir 2 -o - | tcpdump-uw -enr - | grep -i "#.#.#.#"

    [TIME] [MAC A] > [MAC B], ethertype IPv4 (0x0800), length 74: [IP A].[Port A] > [IP B].[Port B]: Flags [S], seq [SEQ X], win 64240, options [mss 1460,sackOK,TS val [TS VAL 1] ecr 0,nop,wscale 7], length 0
    [TIME] [MAC A] > [MAC B], ethertype IPv4 (0x0800), length 74: [IP A].[Port A] > [IP B].[Port B]: Flags [S], seq [SEQ Y], win 64240, options [mss 1460,sackOK,TS val [TS VAL 2] ecr 0,nop,wscale 7], length 0
    [TIME] [MAC A] > [MAC B], ethertype IPv4 (0x0800), length 74: [IP A].[Port A] > [IP B].[Port B]: Flags [S], seq [SEQ Y], win 64240, options [mss 1460,sackOK,TS val [TS VAL 3] ecr 0,nop,wscale 7], length 0
    [TIME] [MAC A] > [MAC B], ethertype IPv4 (0x0800), length 74: [IP A].[Port A] > [IP B].[Port B]: Flags [S], seq [SEQ Y], win 64240, options [mss 1460,sackOK,TS val [TS VAL 4] ecr 0,nop,wscale 7], length 0

 

  • But when performed packet capture on VMNIC of ESXi, we don't see any packets , confirming that the packets are dropped before reaching the physical uplink  of the ESXi host.

    #  pktcap-uw --uplink vmnic2 --dir 2 -o - | tcpdump-uw -enr - | grep -i "#.#.#.#"



  • Running Packet capture with --trace shows that  the packet is being dropped (freed) within the ESXi IOChain after being processed by the vSphere Security and Inspection Platform (VSIP) module.

    # pktcap-uw --trace --ip <IP address of NFS server>

    [TIME][3] PktHandleID: [HANDLE ID], Captured at PktFree point, TSO not enabled, Checksum offloaded and not verified, SourcePort [PORT X], QID [QID X], headroomlen [LEN Y], length [LEN Z].

          PATH:
              +- [[PATH TIME]] |                           VnicTx |  ######### |
              +- [[PATH TIME]] |                        PortInput |  ######### |
              +- [[PATH TIME]] |                          IOChain |            | [email protected]#1.0.8.0.24765085
              +- [[PATH TIME]] |                          IOChain |            | [email protected]#1.0.8.0.24765085
              +- [[PATH TIME]] |                          IOChain |            | [email protected]#v2_13_0_0
              +- [[PATH TIME]] |                      PreDVFilter |            |
              +- [[PATH TIME]] |                     PostDVFilter |            |
              +- [[PATH TIME]] |                          IOChain |            | VSIPNetxProcessPacketsPreGVM2S@com.vmware.vsip#1.0.8.0.24765085
              +- [[PATH TIME]] |                          PktFree |            | 

 

  • Performing packet capture on Post DVFIlter we could see packets were passing through the DFW rule without any drops.

    # pktcap-uw  --capture PostDVFilter --dvfilter nic-#######-eth0-vmware-sfw.2  -o - | tcpdump-uw -enr - | grep -i "1<NFS Server IP>"

    [TIME] [MAC_SRC] > [MAC_DST], ethertype IPv4 (0x0800), length 74: [IP_SRC:PORT_SRC] > [IP_DST:PORT_DST]: Flags [S], seq [SEQ_1], win 64240, options [mss 1460,sackOK,TS val [TS_VAL_1] ecr [ECR_VAL],nop,wscale 7], length 0
    [TIME] [MAC_SRC] > [MAC_DST], ethertype IPv4 (0x0800), length 74: [IP_SRC:PORT_SRC] > [IP_DST:PORT_DST]: Flags [S], seq [SEQ_2], win 64240, options [mss 1460,sackOK,TS val [TS_VAL_2] ecr [ECR_VAL],nop,wscale 7], length 0
    [TIME] [MAC_SRC] > [MAC_DST], ethertype IPv4 (0x0800), length 74: [IP_SRC:PORT_SRC] > [IP_DST:PORT_DST]: Flags [S], seq [SEQ_3], win 64240, options [mss 1460,sackOK,TS val [TS_VAL_5] ecr [ECR_VAL],nop,wscale 7], length 0



  • From the summarize dvfilter output, we could notice that the traffic for the identified VM is being steered through a Service Insertion (SI) filter

    # summarize-dvfilter | grep -A10 <VM-name>

    world ##### vmm0:######### vcUuid:'## ## ## ## ## ## ## ##-## ## ## ## ## ## ## ##'
    port ##### #######.eth0
    vNic slot 12
    name: nic-######-eth0-vmware-si.12
    agentName: vmware-si
    state: IOChain Attached
    vmState: Detached
    failurePolicy: failOpen
    serviceVMID: none
    filter source: Dynamic Filter Creation
    moduleName: nsxt-vsip-#######
    world ###### vmm0:####### vcUuid:'## ## ## ## ## ## ## ##-## ## ## ## ## ## ## ##'

 

 

Environment

VMware NSX

Cause

Traffic is being dropped within the E-W (East-West) Network Introspection chaining (e.g., Guest Introspection or third-party IDS/IPS)

Resolution

To workaround this issue, follow the below steps to add the VM to Network Introspection Exclusion List-

       1.  Log in to the NSX Manager UI.
       2.  Navigate to Security > E-W Network Introspection  > Action > Exclusion List
       3.  Add the affected VMs to the Exclusion List for Network Introspection.
       4.  Verify that the VMs can now consistently access the NFS mount points and that the `ls` command no longer hangs.

 

Note: Placing a VM in the exclusion list means its traffic will bypass the Network Introspection service.
          Ensure this aligns with your organization's security policies for the affected environment.
          If the proposed workaround contradicts established security policies and cannot be implemented, please open a case with the Broadcom Application Networking and Security.

Additional Information

Exclude Members from a Security Service

Troubleshooting NSX using Packet Captures

Understanding the messages: Lost connection to server nfs-server mount point and Restored connection to server nfs-server mount point