vSphere HA Agent unreachable on vSAN Cluster due to FDM Firewall Restrictions
search cancel

vSphere HA Agent unreachable on vSAN Cluster due to FDM Firewall Restrictions

book

Article ID: 433417

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • In a vSAN-enabled cluster, vSphere HA fails to configure on slave nodes while master election completes successfully. The vSphere Client reports the following error: "HA agent unreachable" 
  • Key observations:
    • All hosts are reachable via ICMP on the vSAN VMkernel adapters.
    • All vSAN objects are healthy validated with esxcli vsan debug object health summary get

Health Status                                              Number Of Objects
---------------------------------------------------------  -----------------
remoteAccessible                                                           0
inaccessible                                                               0
reduced-availability-with-no-rebuild                                       0
reduced-availability-with-no-rebuild-delay-timer                           0
reducedavailabilitywithpolicypending                                       0
reducedavailabilitywithpolicypendingfailed                                 0
reduced-availability-with-active-rebuild                                   0
reducedavailabilitywithpausedrebuild                                       0
data-move                                                                  0
nonavailability-related-reconfig                                           0
nonavailabilityrelatedincompliancewithpolicypending                        0
nonavailabilityrelatedincompliancewithpolicypendingfailed                  0
nonavailability-related-incompliance                                       0
nonavailabilityrelatedincompliancewithpausedrebuild                        0
healthy                                                                  100

    • Network connectivity for Fault Domain Manager (FDM) on port 8182 fails between hosts despite basic network reachability.

nc -zv -s source_ip_address destionation_ip 8182 

nc: connect to destionation_ip  port 8182 (tcp) failed: Connection timed out

    • The /var/run/log/fdm.log on the slave hosts would report the following errors: 

YYYY:MM:DDTHH:MM:SS Er(163) Fdm[2630398]: [Originator@6876 sub=Vmomi opID=52edf9f3] Caught exception while sending activation result; <<52605254-c6bf-b477-2898-ce8a15b8998b, <TCP '127.0.0.1 : 9089'>, <TCP '127.0.0.1 : 27286'>>, fdmService, csi.FdmService.GetDebugManager, <csi.version.version1, official, 1.0>, <<io_obj p:0x000000f84d8cc440, h:21, <TCP '127.0.0.1 : 9089'>, <TCP '127.0.0.1 : 27286'>>, 52605254-c6bf-b477-2898-ce8a15b8998b>>, N5Vmomi5Fault11SystemError9ExceptionE(Fault cause: vmodl.fault.SystemError
YYYY:MM:DDTHH:MM:SS Er(163) Fdm[2630223]: --> )
YYYY:MM:DDTHH:MM:SS Er(163) Fdm[2630223]: --> [context]zKq7AVECAQAAAA8jcwETZmRtAICMf4EBZmRtAAAu7NoAiQ3ugCB0XgGAFnFeAQCFsu8A5/jvAOcF8IA/ploBAPgB3AAso+4Aab3uAHtO7gB+/e6ALgtsAYAQPGwBgNvYjAEBUngAbGlicHRocmVhZC5zby4wAAI/Ug9saWJjLnNvLjYA[/context]

YYYY:MM:DDTHH:MM:SS In(166) Fdm[2630228]: [Originator@6876 sub=FdmDump opID=WorkQueue-6fd86672] BEGIN DUMP
YYYY:MM:DDTHH:MM:SS In(166) Fdm[2630223]: --> Time=YYYY:MM:DDTHH:MM:SS
YYYY:MM:DDTHH:MM:SS In(166) Fdm[2630223]: --> OpId=
YYYY:MM:DDTHH:MM:SS In(166) Fdm[2630223]: --> Dump Reason=FailoverStart
YYYY:MM:DDTHH:MM:SS In(166) Fdm[2630223]: -->
YYYY:MM:DDTHH:MM:SS In(166) Fdm[2630223]: --> MODULE=FdmService
YYYY:MM:DDTHH:MM:SS In(166) Fdm[2630223]: --> Cluster state: Master (3)
YYYY:MM:DDTHH:MM:SS In(166) Fdm[2630223]: -->
YYYY:MM:DDTHH:MM:SS In(166) Fdm[2630223]: --> Slave states (4):
YYYY:MM:DDTHH:MM:SS In(166) Fdm[2630223]: -->      host-XX: FDMUnreachable
YYYY:MM:DDTHH:MM:SS In(166) Fdm[2630223]: -->      host-XX: FDMUnreachable
YYYY:MM:DDTHH:MM:SS In(166) Fdm[2630223]: -->      host-XX: FDMUnreachable
YYYY:MM:DDTHH:MM:SS In(166) Fdm[2630223]: -->      host-XX: FDMUnreachable

 

Environment

VCF 5.x 

VCF 9.x 

VVF 9.x 

vCenter 8.x, 7.x 

Cause

The ESXi firewall's FDM rule is configured to allow traffic only to the Management Network IPs, blocking the communication over the vSAN network on port 8182.

 In a vSAN-enabled cluster, the HA traffic flows over this storage network rather than the management network. 

 

Resolution

To resolve the issue 

 

  1. Log in to the vSphere Client.

  2. Select an affected ESXi host in the inventory.

  3. Navigate to the Configure tab.

  4. Under System, click Firewall.

  5. Click Edit and locate the fdm (Fault Domain Manager) service.

  6. In the Allowed IP Addresses section, ensure the IP addresses for all vSAN IPs  in the cluster are added, or select Allow connections from any IP address.

    • Note: Allowing connections from any IP address permits all external traffic to communicate with your network. Please evaluate the environment's security compliance before enabling this option
  7. Repeat these steps for all hosts in the cluster.
  8. Right-click the Cluster in the inventory and select vSphere HA > Reconfigure for vSphere HA.

 

Additional Information

Refer techdocs Using vSphere HA with vSAN for information on vSphere HA on vSAN clusters