vSAN skyline health reports errors: vSAN: Basic (unicast) connectivity check and vSAN: MTU check (ping with large packet size)
search cancel

vSAN skyline health reports errors: vSAN: Basic (unicast) connectivity check and vSAN: MTU check (ping with large packet size)

book

Article ID: 389049

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • vSAN Skyline health reports below errors.
    vSAN: Basic (unicast) connectivity check.
    vSAN : MTU check (ping with large packets) 
    Vsan object health

Validation Step:

  • To validate the faulty host, click on the Troubleshoot option for the primary issue "vSAN: Basic (unicast) connectivity check".
  • Based on the screenshot below, it is confirmed that Host-01 is unable to communicate with the other two hosts in the cluster via VMK1 with an MTU of 1500.

Environment

VMware vSAN 6.x
VMware vSAN 7.x
VMware vSAN 8.x

Cause

The hosts are unable to communicate with each other over vSAN traffic due to an NIC issues.

Cause validation:

  • Run the command "esxcli vsan network list" to identify the VMK used for vSAN traffic.
    esxcli vsan network list
    Interface
    VmkNic Name: vmk1
    IP Protocol: IP
    Interface UUID: yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyy
    Agent Group Multicast Address: xxx.x.x.x
    Agent Group IPv6 Multicast Address: xxxx: :x:x:x
    Agent Group Multicast Port: zzzzz
    Master Group Multicast Address: xxx.x.x.x
    Master Group IPv6 Multicast Address: xxxx: :x:x:x
    Master Group Multicast Port: zzzzz
    Host Unicast Channel Bound Port: zzzzz
    Data-in-Transit Encryption Key Exchange Port: 0
    Multicast TTL: 5
    Traffic Type: vsan

    In the above example, it is confirmed that vmk1 is used for vSAN traffic.

  • Run the command "esxcfg-vswitch -l" to identify the vSwitch used for vSAN traffic and check the MTU configured on it
    esxcfg-vswitch -l

    DVS Name                   Num Ports    Used Ports      Configured Ports MTU      MTU
    Switch name                  2520            10              512                 9000

    DVPort ID                                                 In Use                 Client
    512                                                         1                    vmnicl         
    513                                                         1                    vmnic0
    514                                                         0                 
    515                                                         0                    
    0                                                           1                    vmk0
    128                                                         1                    vmk1
    256                                                         1                    vmk2

    In the above example, it is confirmed that vmnic1 and vmnic2 are used for vsan communication. vSwicth is configured with 9000 MTU. 

  • Run the command "esxcfg-vmknic -l" to verify the MTU set on the VMkernel adapter (vmk)
    esxcfg-vmknic -l
    vmk1               128                            IPv4                                                     9000

    In the above example, it is confirmed that vmk1 is configured with MTU 9000.

  • Run the command "esxcfg-nics -l" to confirm the MTU configured on the physical nics (vmnics).
    esxcfg-nics -l

    Name             PCI                                 Driver    Link Speed     Duplex     MAC Address                   MTU     Description
    vmnico       xxxx: xx: xx: x vmxnet        Up         10000Mbps     Full          xx:xx:xx:xx:xx:xx:xxxx      9000
    vmnicl        xxxx: xx: xx: x vmxnet        Up          10000Mbps    Full          xx:xx:xx:xx:xx:xx:xxxx      9000

    In the above example, it is confirmed that vmnics are configured with MTU 9000.

    Repeat the above procedure for all the hosts in the cluster and make sure the MTU should be consistent across the network

  • Ping the faulty host from a working host using a 1500 MTU.
    vmkping -I vmkx -d -s  1472 <IP adress of faulty node>"
    PING xx.xx.xxx.xx ( xx.xx.xxx.xx): 1472 data bytes

    ---  xx.xx.xxx.xx ping statistics ---
    3 packets transmitted, 0 packets received, 100% packet loss

  • Ping the faulty host from a working host using a 9000 MTU.
    vmkping -I vmkx -d -s  8972 <IP adress of faulty node>"
    PING xx.xx.xxx.xx ( xx.xx.xxx.xx): 8972 data bytes

    ---  xx.xx.xxx.xx ping statistics ---
    3 packets transmitted, 0 packets received, 100% packet loss

    Based on the results from the steps above, it is confirmed that the healthy host is unable to communicate with the faulty host over vSAN traffic with both the MTUs "1500 and 9000".

Resolution

 

  • Identify the Active vmnic serving vSAN Traffic from ESXTOP data.


    In the above example, it is confirmed that vmnic0 is actively serving vsan traffic.

  • Perform precheck and enter the host into Maintenance mode to ensure all active VMs migrated to other hosts in the cluster.
  • Bring down the active vmnic, if other active vmnic is configured.
    "esxcli network nic down -n vmnicX"
  • Observe the vmnic failover using ESXTOP data

    Please refer this KB : Using esxtop to identify storage performance issues for ESXi (multiple versions)
  • Post the vmnic failover, perform ping tests between affected hosts.
    "vmkping -I vmkX -d -s 1472/8972 <ipadress>"
  • If one NIC fails to serve traffic and the other does not, involve internal networking team or hardware vendor to investigate and resolve the issues with vmnic.