Unable to deploy VMs on vSAN cluster due to latency
search cancel

Unable to deploy VMs on vSAN cluster due to latency

book

Article ID: 393370

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • High latency on VMs running in vSAN cluster.

  • VM deployment task on vSAN cluster never completes or is very slow.


  • No Resync operations on the vSAN cluster during the time of issue.

  • Latency seen on vSAN cluster during a vSAN node ESXi patch upgrade.

Environment

VMware vSphere vSAN 7.x

VMware vSphere vSAN 8.x

Cause

  • "Wait for RDT" events in the vSAN traces indicates an ongoing network issue along with TCP IP errors on the vSAN network.

  • TCP retransmission indicates a transient network condition. If there is network congestion or intermittent connectivity between vSAN nodes, the replication of data can be delayed. This can cause the system to wait for the data transfer to complete.


Cause Validation:

  • Select vSAN Cluster > Monitor > Under vSAN, select "Support" > select "Performance For Support" > In the "Performance Dashboard" dropdown option select "Network" > TCP/IP > Select Host.



  • Run the following command to identify the VMkernel port used for vSAN, and copy the output for later use: 

[root@server name:~] esxcli vsan network list
Interface
   VmkNic Name: vmk1
   IP Protocol: IP
   Interface UUID: ########-####-####-####-############
   Agent Group Multicast Address: 224.2.3.4
   Agent Group IPv6 Multicast Address: ff19::2:3:4
   Agent Group Multicast Port: 23451
   Master Group Multicast Address: 224.1.2.3
   Master Group IPv6 Multicast Address: ff19::1:2:3
   Master Group Multicast Port: 12345
   Host Unicast Channel Bound Port: 12321
   Data-in-Transit Encryption Key Exchange Port: 0
   Multicast TTL: 5
   Traffic Type: vsan

Note: Take note of the VmkNic Name - in the above output it's "vmk1". So vmk1 is being used for vsan traffic.

  • Run below command to identify the vmnics in use by the vsan vmkernel adapter.

[root@server name:~] esxcfg-vswitch -l
Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
vSwitch0         2520        1           128               1500

  PortGroup Name                            VLAN ID  Used Ports  Uplinks
  VM Network                                0        0

DVS Name         Num Ports   Used Ports  Configured Ports  MTU     Uplinks
DV-Switch        2520        14          512               1500    vmnic1,vmnic2,vmnic3,vmnic0

  DVPort ID                               In Use      Client
  68                                      1           vmnic0
  69                                      1           vmnic1
  70                                      1           vmnic2
  71                                      1           vmnic3
  72                                      1           vmk0
  19                                      1           vmk1
  27                                      1           vmk2
  35                                      1           vmk3
  64                                      1           vmk4

Note: As per the above result, we know that Distributed switch "DV-Switch" is using four uplinks (vmnic0, 1, 2 and 3).

  • To identify which vmnic is currently in use for vSAN traffic, run below command -

    [root@server name:~] esxtop

    Press 'n' to switch to network view.

   PORT-ID USED-BY                         TEAM-PNIC DNAME              PKTTX/s  MbTX/s   PSZTX    PKTRX/s  MbRX/s   PSZRX %DRPTX %DRPRX
  67108870 Management                            n/a vSwitch0              0.00    0.00    0.00       0.00    0.00    0.00   0.00   0.00
 100663306 Management                            n/a DvsPortset-0          0.00    0.00    0.00       0.00    0.00    0.00   0.00   0.00
 100663308 vmk0                               vmnic0 DvsPortset-0         21.32    0.46 2845.00      24.84    0.11  574.00   0.00   0.00
 100663310 Shadow of vmnic0                      n/a DvsPortset-0          0.00    0.00    0.00       0.00    0.00    0.00   0.00   0.00
 100663312 Shadow of vmnic3                      n/a DvsPortset-0          0.00    0.00    0.00       0.00    0.00    0.00   0.00   0.00
 100663314 Shadow of vmnic2                      n/a DvsPortset-0          0.00    0.00    0.00       0.00    0.00    0.00   0.00   0.00
 100663316 Shadow of vmnic1                      n/a DvsPortset-0          0.00    0.00    0.00       0.00    0.00    0.00   0.00   0.00
 100663317 vmk1                               vmnic1 DvsPortset-0          0.00    0.00    0.00       0.00    0.00    0.00   0.00   0.00
 100663318 vmk2                               vmnic2 DvsPortset-0          0.00    0.00    0.00       0.00    0.00    0.00   0.00   0.00
 100663319 vmk3                               vmnic3 DvsPortset-0          0.00    0.00    0.00       0.00    0.00    0.00   0.00   0.00
 100663320 vmk4                               vmnic2 DvsPortset-0          0.59    0.00  188.00       0.78    0.00  115.00   0.00   0.00
2248146957 vmnic0                                  - DvsPortset-0         21.32    0.46 2845.00      30.32    0.11  483.00   0.00   0.00
2248146959 vmnic3                                  - DvsPortset-0          0.00    0.00    0.00       0.00    0.00    0.00   0.00   0.00
2248146961 vmnic2                                  - DvsPortset-0          0.59    0.00  188.00       0.78    0.00  115.00   0.00   0.00
2248146963 vmnic1                                  - DvsPortset-0          0.00    0.00    0.00       1.17    0.00   60.00   0.00   0.00

Now, we know the vmnic2 is actively being used for vSAN traffic.

  • Run below command to list the stats for vmnic 2.

    [root@server name:~] esxcli network nic stats get -n vmnic2
       NIC statistics for vmnic2:
          Packets received: 23751368617
          Packets sent: 19730325195
          Bytes received: 130908835599844
          Bytes sent: 68318281765261
          Receive packets dropped: 56035123
          Transmit packets dropped: 0
          Multicast packets received: 261269
          Broadcast packets received: 259237
          Multicast packets sent: 0
          Broadcast packets sent: 1506
          Total receive errors: 0
          Receive length errors: 0
          Receive over errors: 0
          Receive CRC errors: 685
          Receive frame errors: 0
          Receive FIFO errors: 0
          Receive missed errors: 0
          Total transmit errors: 0
          Transmit aborted errors: 0
          Transmit carrier errors: 0
          Transmit FIFO errors: 0
          Transmit heartbeat errors: 0
          Transmit window errors: 0

    Note: If the CRC errors are not 0 then it is a networking issue. However make sure the CRC errors are increasing since the value seen could have been accumulated since the last host reboot. Also monitor the error parameters for high value.


    Incase the above stats are not high then check vsantraces log for below events -

    2025-03-10T00:16:45.439638 [3612038] [cpu22] [c826378b OWNER readWithBlkAttr5 VMDISK] DOMTraceOpTookTooLong:10304: {'op': 0x45bad8d63c80, 'objUuid': 'xxxxxxx-xxxxxx-xxxx-xxxx-xxxxxxxxxxxx', 'offset-39': 138322706432, 'length-25': 65536, 'totalTimeMS': 10014, 'timeInThisPhaseMS': 10014, 'opPhase': 'Wait for RDT'}

Resolution