vSAN Health Test and MTU Check Failures During Host Remediation
search cancel

vSAN Health Test and MTU Check Failures During Host Remediation

book

Article ID: 428262

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

When you use vSphere Lifecycle Manager to remediate a vSAN cluster and place a host into maintenance mode for patching, you may find that the remediation process fails with the following reported errors.

 

  • Remediation failed

  • vSAN health check 'vSAN: Basic (unicast) connectivity check' reported an issue for cluster [Redacted]. Check the vSAN health.

  • vSAN health check 'vSAN: MTU check (ping with large packet size)' reported an issue for cluster [Redacted]. Check the vSAN health.

  • vSAN health check 'vMotion: Basic (unicast) connectivity check' reported an issue for cluster [Redacted]. Check the vSAN health.

  • vSAN health check 'vMotion: MTU check (ping with large packet size)' reported an issue for cluster [Redacted]. Check the vSAN health.

  • Health Check for [Redacted] failed.

  •  [Redacted].com - Skipped remediation for this host.

  • Host [Redacted] was not processed, the reason: 'Health Check for...'

Environment

vSAN

Cause

The alarm failures are attributed to physical network latency or configuration issues outside of the VMware software stack.

Resolution

Follow KB https://knowledge.broadcom.com/external/article/389049/vsan-skyline-health-reports-errors-vsan.html

To Troubleshoot start with using the vSAN Skyline health test in vCenter.

  • Validate the faulty host(s), click on the Troubleshoot option for the primary issue "vSAN: Basic (unicast) connectivity check". This will ping between vSAN vmkernels on hosts in the vSAN cluster, and gives a report that will tell if any hosts are unreachable. It also shows which vmkernel is used for vSAN, and what size packet (MTU) was used for the ping test. 
  • In vCenter, we will want to check the vmkernel and the vSwitch and validate on the vSS or VDS virtual switches that the MTU for the vSAN is the same as the MTU configured on the port group that is assigned to on each host. 
  • In vCenter select a host from inventory > Configure > Networking > VMkernel adapters > select the VMkernel with vSAN service enabled > Properties > NIC settings > MTU
  • In vCenter select a host from inventory > Configure > Networking > Virtual switches > expand the virtual switch used by the vSAN VMKernel > three dots to the far right > View Settings  > MTU
  • Verify that the MTU in these two places match. 
  • If the MTU values are not a match then correct the values to match the designed configuration values expected for your virtual and physical network.

 

If the problem continues Then a Ping should be conducted on the faulty host(s) from a working host using a 1500 MTU and 9000 MTU. 

Log into the ESXi Host(s) via Putty/SSH to access the CLI.

Test the connectivity between hosts.


Example output of failed vmkping tests:
PING xx.xx.xxx.xx ( xx.xx.xxx.xx): 1472 data bytes

---  xx.xx.xxx.xx ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

PING xx.xx.xxx.xx ( xx.xx.xxx.xx): 8972 data bytes

---  xx.xx.xxx.xx ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

if either of the above tests fail. This confirms there is an issue preventing network connectivity between the individual ESXi hosts defined vSAN vmkernel ports.

Physical Network Checks need to be conducted upstream (physical network) from the Hypervisor with the physical network administrator to determine the underlying cause of the failing connectivity health.

Additional Information

https://knowledge.broadcom.com/external/article/391812/vsan-skyline-health-reports-errorsvsan-m.html

https://knowledge.broadcom.com/external/article/326954/troubleshooting-vsan-networking.html

https://knowledge.broadcom.com/external/article?articleNumber=379982

https://knowledge.broadcom.com/external/article/389049/vsan-skyline-health-reports-errors-vsan.html