Stretching a cluster fails with "Validate interzone host connectivity failed due to unknown error"
search cancel

Stretching a cluster fails with "Validate interzone host connectivity failed due to unknown error"

book

Article ID: 324069

calendar_today

Updated On:

Products

VMware Cloud Foundation

Issue/Introduction

  • This article provides a workaround for VMware Cloud Foundation deployments that encounter the error "Validate interzone host connectivity failed due to unknown error".  
  • Environments that do not allow routing between the Management Network and the vSAN Network between Availability Zones will encounter this issue.


Environment

VMware Cloud Foundation 3.9.x

Cause

Up to vSphere 6.7u3 vSAN does not have a dedicated TCP/IP stack, instead it uses the Management TCP/IP stack to route traffic out of the ESXi host. 

To force the vSAN interface to be used to route traffic, static routes must be added.  Although the Cloud Foundation adds static routes during the Stretched Cluster workflow to ensure that vSAN traffic doesn't egress the Management interface, it does not add them prior to the validation.

Because the static routes do not exist before the interzone host connectivity task, the ping test sends traffic out the Management (vmk0), if the vSAN network is not reachable via the Management network the validation will fail with the error, "
Validate interzone host connectivity failed due to unknown error'.

Resolution

This is a known issue for environments that do not allow routing between the Management Network and the vSAN Network. 

Prior to executing the Cloud Foundation Stretched Cluster Workflow, static routes will need to added to all hosts in AZ1 and AZ2 that will participate in the stretched cluster.

Example:
The vSAN network in AZ2 is 172.16.33.0/24.  
  1. SSH to each ESXi host in AZ1 and view the routing table.  In the example below the 172.16.33.0/24 network does not exist. Run the following command:
esxcfg-route -l

VMkernel Routes:
Network          Netmask          Gateway          Interface      
172.16.11.0      255.255.255.0    Local Subnet     vmk0           
172.16.13.0      255.255.255.0    Local Subnet     vmk2           
default          0.0.0.0          172.16.11.253    vmk0    
  1. Manually add the static route to each ESXi host in AZ1 using the command below:
esxcli network ip route ipv4 add -n 172.16.33.0/24 -g 172.16.13.253
  1. Verify that the static route was created successfully. Run the following command:
esxcfg-route -l

VMkernel Routes:
Network          Netmask          Gateway          Interface      
172.16.11.0      255.255.255.0    Local Subnet     vmk0           
172.16.13.0      255.255.255.0    Local Subnet     vmk2           
172.16.33.0      255.255.255.0    172.16.13.253    vmk2           
default          0.0.0.0          172.16.11.253    vmk0 

Note: Be sure to complete the process for each ESXi host in AZ1 that is participating in the stretched cluster.

The vSAN network in AZ1 is 172.16.13.0/24
  1. SSH to each ESXi host in AZ1 and view the routing table.  In the example below the 172.16.33.0/24 network does not exist. Run the following command:
esxcfg-route -l

VMkernel Routes:
Network          Netmask          Gateway          Interface      
172.16.31.0      255.255.255.0    Local Subnet     vmk0           
172.16.33.0      255.255.255.0    Local Subnet     vmk2           
default          0.0.0.0          172.16.31.253    vmk0    
  1. Manually add the static route to each ESXi host in AZ1 using the command below:
esxcli network ip route ipv4 add -n 172.16.13.0/24 -g 172.16.13.253
  1. Verify that the static route was created successfully. Run the following command:
esxcfg-route -l

VMkernel Routes:
Network          Netmask          Gateway          Interface      
172.16.31.0      255.255.255.0    Local Subnet     vmk0           
172.16.33.0      255.255.255.0    Local Subnet     vmk2           
172.16.13.0      255.255.255.0    172.16.33.253    vmk2           
default          0.0.0.0          172.16.11.253    vmk0 

Once the static routes have been added retry the workflow.