'Validate vMotion Network Connectivity' task fails during cluster deployment in VCF 9.0.x
search cancel

'Validate vMotion Network Connectivity' task fails during cluster deployment in VCF 9.0.x

book

Article ID: 441241

calendar_today

Updated On:

Products

VMware SDDC Manager / VCF Installer

Issue/Introduction

  • The Cluster deployment in VCF 9.0.x fails at 'Validate vMotion Network Connectivity' phase with the following error:


    Validate vMotion Network Connectivity Failed

    MM/DD/YY, HH:MM:SS
    Beginning of Expandable row content Screen reader table commands may not work for viewing expanded content, please use your screen reader's browse mode to read the content exposed by this button
    Description          Validate vMotion Network Connectivity
    Progress Messages    Failed to skip failed ESXi hosts <ESX1.EXAMPLE.COM>, <ESX2.EXAMPLE.COM>, <ESX3.EXAMPLE.COM>
    Error Message: Failed to skip failed ESXi hosts <ESX1.EXAMPLE.COM>, <ESX2.EXAMPLE.COM>, <ESX3.EXAMPLE.COM>
    Reference Token: ######
    Cause: Cannot skip 3 ESXi Host(s) ([<ESX1.EXAMPLE.COM>, <ESX2.EXAMPLE.COM>, <ESX3.EXAMPLE.COM>]) as only 1 ESXi host(s) would remain and the minimum is 3

  • On the SDDC Manager, in /var/log/vmware/vcf/domainmanager/domainmanager.log, 100% packet loss is observed for the vMotion network ping test with packet size 8972:

    YYYY-MM-DDTHH:MM:SS DEBUG [vcf_dm,############,####] [c.v.e.s.c.h.n.EsxiHostNetworkingUtil,dm-exec-28]  Pinged ###.##.#.## via VM Kernel vmk## with MTU 8972 from ESXi host <ESX1.EXAMPLE.COM>: {"DataObject":[{"Summary":{"Duplicated":0,"HostAddr":"###.##.#.##","PacketLost":100,"Received":0,"Recieved":0,"RoundtripAvg":-2147483648,"RoundtripAvgMS":-2147483648
    ,"RoundtripMax":0,"RoundtripMaxMS":0,"RoundtripMin":999999000,"RoundtripMinMS":999999,"Transmitted":3}}]}
    YYYY-MM-DDTHH:MM:SS DEBUG [vcf_dm,############,####] [c.v.e.s.v.v.SinglePortgroupValidator,dm-exec-27]  Ping with MTU 9000 failed, attempting ping with standard MTU
    YYYY-MM-DDTHH:MM:SS WARN  [vcf_dm,############,####] [c.v.v.c.f.p.a.i.ValidateEsxiHostNetworkConnectivityAction,dm-exec-10]  Network connectivity validation of ESXi hosts failed for ESX1.EXAMPLE.COM, ESX2.EXAMPLE.COM, ESX3.EXAMPLE.COM, Reference Token: ######

  • The vMotion network is then removed from the ESXi hosts:

    YYYY-MM-DDTHH:MM:SS DEBUG [vcf_dm,############,####] [c.v.e.s.c.h.n.EsxiHostNetworkingUtil,dm-exec-10]  Cleaning up ESXi Host ESX1.EXAMPLE.COM from VM Kernels vmk30 and vSwitches <DVS_SWITCH>

Environment

VMware Cloud Foundation 9.0.x

Cause

  • During the 'Validate vMotion Network Connectivity' phase, SDDC Manager creates the vMotion VMkernel adapters on the ESXi hosts and attempts a ping test using a packet size of 8972 (9000 MTU network). The 100% packet loss error in the domainmanager.log indicates that the physical network is dropping these packets.

  • Common physical network misconfigurations that cause this include:

    • Jumbo Frames (MTU 9000) not enabled on the physical Top of Rack (ToR) switches for the vMotion VLAN.
    • The vMotion VLAN not properly allowed or trunked on the physical switch ports connected to the ESXi host uplinks.
    • A Layer 2 or Layer 3 connectivity issue preventing the ESXi hosts from routing traffic to each other on the designated vMotion subnet.

  • Because the validation fails, SDDC Manager halts the deployment, rolls back the configuration (cleaning up the vmk interfaces and vSwitches), and errors out.

Resolution

Engage the internal network administration team to fix the network issues between the hosts and then retry the failed cluster creation task.