Unable to ping between ESXi vSAN vmkernel adapters
search cancel

Unable to ping between ESXi vSAN vmkernel adapters

book

Article ID: 415043

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • ESXi host is being marked down and performing vmkpings between vSAN vmkernel adapters is failing on the host 
  • One host is unable to communicate with the vSphere HA primary host
  • This can occur following a reboot of an ESXi host
  • You may observe packet loss within the physical network underlay, virtual networking, or both

Environment

VMware vSphere ESXi

Cause

  • Potential physical underlay network issues that block/drop traffic between hosts over specific vmnics
  • Configuration issues

Resolution

The testing below is done from the very edge of the virtual network, when the ESXi host is handing the packet off to the physical NIC driver. Begin by confirming which vmnic is in use currently by the vSAN vmkernel adapters (vmk):

  1. Ensure ICMP is not blocked - If ICMP is disabled, the below tests will not show accurate data/results.  
  2. Open an SSH session to the ESXi hosts.
  3. Run the command esxtop > hit enter > press the letter 'n' to view the networking data.
  4. Locate the vSAN vmk and make note of which vmnic is in use. The output will look similar to the below example:
  5. Repeat the above process for the other host as well.

Capturing traffic between the two hosts:

    1. Ensure each host has a total of two SSH sessions opened, wait to run the commands until all four SSH sessions have been configured.
    2. Identify which host will be the source host and which will be the destination host.
      • Source Host
        1. On one SSH session for the source host configure the vmkping command, based on the MTU in use:
          • 1500 MTU: vmkping -I vmk# #.#.#.# -d -s 1472
          • 9000 MTU: vmkping -I vmk# #.#.#.# -d -s 8972
          • Replace the vmk# with the vmk in question (from the picture above it would be vmk1) and replace the #.#.#.# with the destination hosts vmk IP.
        2. On the second SSH session for the source host configure the packet capture command, using the vmnic that was found from the esxtop data:
          • pktcap-uw --uplink vmnic# --capture UplinkSndKernel -o - |tcpdump-uw -ner- arp or icmp
      • Destination Host
        1. On one SSH session for the destination host configure the vmkping command using the source hosts IP
        2. On the second SSH session configure the packet capture using the vmnic that was found from the esxtop data:
          • pktcap-uw --uplink vmnic# --capture UplinkRcvKernel -o - |tcpdump-uw -ner- arp or icmp
    3. Being both the packet captures and then begin the vmkpings from the source host to the destination host
    4. Review the source packet capture and see if there are ICMP or ARP requests leaving the ESXi host. Successful ICMP packets being sent will look like the following example:
      [source_host_here:~] pktcap-uw --uplink vmnic1 --capture UplinkSndKernel -o - |tcpdump-uw -ner- arp or icmp
      The name of the uplink is vmnic1
      The session capture point is UplinkSndKernel
      pktcap: The output file is -.
      pktcap: No server port specified, select #### as the port.
      pktcap: Local CID #.
      pktcap: Listen on port ####.
      pktcap: Main thread: ############.
      pktcap: Dump thread: ############.
      pktcap: The output file format is pcapng.
      pktcap: Recv Thread: ############.
      pktcap: Accept...
      pktcap: Vsock connection from port #### cid #.
      reading from file -, link type ##### (Ethernet), snapshot lenght #####
      16:57:57.094737 ##.##.##.##.##.## > ##.##.##.##.##.##, ethertype IPv4 (#x####) length ####: #.#.#.# > #.#.#.#: ICMP echo request, id #####, seq 1, length ####
      16:57:58.095335 ##.##.##.##.##.## > ##.##.##.##.##.##, ethertype IPv4 (#x####) length ####: #.#.#.# > #.#.#.#: ICMP echo request, id #####, seq 2, length ####
      16:57:59.096922 ##.##.##.##.##.## > ##.##.##.##.##.##, ethertype IPv4 (#x####) length ####: #.#.#.# > #.#.#.#: ICMP echo request, id #####, seq 3, length ####
      16:58:00.097783 ##.##.##.##.##.## > ##.##.##.##.##.##, ethertype IPv4 (#x####) length ####: #.#.#.# > #.#.#.#: ICMP echo request, id #####, seq 4, length ####
      • The above output is: HH:MM:SS.##### Source MAC > Destination MAC, ethertype IPv4 (#x####) length ####: Source IP > Destination IP: ICMP echo request, id ####, seq #, length ####. This shows that the source ESXi host is correctly sending the traffic out. 
        • If no traffic is seen being sent from the source host, please open a case with VMware Broadcom Support.
    5. Review the destination packet capture to see if any of the ICMP or ARP requests are being received by the ESXi host. 
      • If there are no packets seen on the destination host from the source host, the packet loss is occurring in the physical network (between hosts). This issue needs to be addressed by the server/hardware vendor. 
      • If the output looks similar to the above data then the packets are being received by the destination host.
        • In this scenario, change the direction of capture, but keeping the vmkping the same (source to destination):
          • Source Host: pktcap-uw --uplink vmnic# --capture UplinkRcvKernel -o - |tcpdump-uw -ner- arp or icmp
          • Destination Host: pktcap-uw --uplink vmnic# --capture UplinkSndKernel -o - |tcpdump-uw -ner- arp or icmp
          • Successful replies will look similar to the below output:
            [destination_host_here:~] pktcap-uw --uplink vmnic1 --capture UplinkSndKernel -o - |tcpdump-uw -ner- arp or icmp
            The name of the uplink is vmnic1
            The session capture point is UplinkSndKernel
            pktcap: The output file is -.
            pktcap: No server port specified, select #### as the port.
            pktcap: Local CID #.
            pktcap: Listen on port ####.
            pktcap: Main thread: ############.
            pktcap: Dump thread: ############.
            pktcap: The output file format is pcapng.
            pktcap: Recv Thread: ############.
            pktcap: Accept...
            pktcap: Vsock connection from port #### cid #.
            reading from file -, link type ##### (Ethernet), snapshot lenght #####
            16:57:57.094737 ##.##.##.##.##.## > ##.##.##.##.##.##, ethertype IPv4 (#x####) length ####: #.#.#.# > #.#.#.#: ICMP echo reply, id #####, seq 1, length ####
            16:57:58.095335 ##.##.##.##.##.## > ##.##.##.##.##.##, ethertype IPv4 (#x####) length ####: #.#.#.# > #.#.#.#: ICMP echo reply, id #####, seq 2, length ####
            16:57:59.096922 ##.##.##.##.##.## > ##.##.##.##.##.##, ethertype IPv4 (#x####) length ####: #.#.#.# > #.#.#.#: ICMP echo reply, id #####, seq 3, length ####
            16:58:00.097783 ##.##.##.##.##.## > ##.##.##.##.##.##, ethertype IPv4 (#x####) length ####: #.#.#.# > #.#.#.#: ICMP echo reply, id #####, seq 4, length ####
            • If successful replies are leaving the destination ESXi host, verify if the source host is receiving the replies.
            • If not, the issue is with the physical network on the path back from destination to source. 
            • If there are replies coming into the source ESXi host from the destination host, then the packets are able to make the full path and there should not be any networking issue. 
              • If there are intermittent pings being lost between the source and destination host, its recommended to ensure each host is sending all packets and receiving all packets, which can be identified by the seq # portion in the output. 

If packets are being dropped in the physical network, please work with the server/hardware vendor to resolve this.

Additional Information