vSphere HA stuck in "HA Agent Unreachable"
search cancel

vSphere HA stuck in "HA Agent Unreachable"

book

Article ID: 422293

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptom:
-------------
1) vSphere HA stuck in "HA Agent Unreachable".
2) Uninstalling fdm vib fails with "Cannot open volume".

[[email protected]:~] esxcli software vib remove -n vmware-fdm
 [InstallationError]
 Failed to query file system stats: Errors:
 Error getting data for filesystem on '/vmfs/volumes/63######-########-####-########4f60': Cannot open volume: /vmfs/volumes/63######-########-####-########4f60, skipping.
 Error getting data for filesystem on '/vmfs/volumes/66######-########-####-########9670': Cannot open volume: /vmfs/volumes/66######-########-####-########9670, skipping.
      cause = Errors:
 Error getting data for filesystem on '/vmfs/volumes/63######-########-####-########4f60': Cannot open volume: /vmfs/volumes/63######-########-####-########4f60, skipping.
 Error getting data for filesystem on '/vmfs/volumes/66######-########-####-########9670': Cannot open volume: /vmfs/volumes/66######-########-####-########9670, skipping.
 Please refer to the log file for more details.
[[email protected]:~]

fdm.log:
--------
YYYY-MM-DD Er(163) Fdm[31077111]: [Originator@6876 sub=Cluster opID=WorkQueue-2c####35] Failed to open file: /vmfs/volumes/62######-########-####-########de3e/.vSphere-HA/FDM-78######-####-####-####-########-##x-####632-VM1/protectedlist
YYYY-MM-DD Er(163) Fdm[31077111]: [Originator@6876 sub=Cluster opID=WorkQueue-2c####35] open(/vmfs/volumes/62######-########-####-########de3e/.vSphere-HA/FDM-78######-####-####-####-########-##x-####632-VM1/protectedlist) failed: Device or resource busy
YYYY-MM-DD In(166) Fdm[31077111]: [Originator@6876 sub=Invt opID=WorkQueue-2c####35] Notify datastore (/vmfs/volumes/62######-########-####-########de3e) locally
YYYY-MM-DD Db(167) Fdm[31077111]: [Originator@6876 sub=Cluster opID=WorkQueue-2c####35] IO error at __localhost__; path: /vmfs/volumes/62######-########-####-########de3e (err: 16)
YYYY-MM-DD Wa(164) Fdm[31077111]: [Originator@6876 sub=VpxProfiler opID=WorkQueue-2c####35] WorkQueue [TotalTime] took 4039 ms
YYYY-MM-DD Er(163) Fdm[31076792]: [Originator@6876 sub=Cluster opID=WorkQueue-26####b9a] Failed to open file: /vmfs/volumes/63######-########-####-########xec0/.vSphere-HA/FDM-78######-####-####-####-########-##x-####632-VM1/protectedlist
YYYY-MM-DD Er(163) Fdm[31076792]: [Originator@6876 sub=Cluster opID=WorkQueue-26####b9a] open(/vmfs/volumes/63######-########-####-########xec0/.vSphere-HA/FDM-78######-####-####-####-########-##x-####632-VM1/protectedlist) failed: Device or resource busy
YYYY-MM-DD In(166) Fdm[31076792]: [Originator@6876 sub=Invt opID=WorkQueue-26####b9a] Notify datastore (/vmfs/volumes/63######-########-####-########xec0) locally
YYYY-MM-DD Db(167) Fdm[31076792]: [Originator@6876 sub=Cluster opID=WorkQueue-26####b9a] IO error at __localhost__; path: /vmfs/volumes/63######-########-####-########xec0 (err: 16)
YYYY-MM-DD Wa(164) Fdm[31076792]: [Originator@6876 sub=VpxProfiler opID=WorkQueue-26####b9a] WorkQueue [TotalTime] took 4040 ms
YYYY-MM-DD Db(167) Fdm[31076789]: [Originator@6876 sub=Cluster opID=clusterManager.cpp:980-38739276] Updating inventory manager with 6 datastores

[[email protected]:~] esxcfg-scsidevs -m
VmFileSystem: Slow refresh failed: Cannot open volume: /vmfs/volumes/63######-########-####-########4f60
VmFileSystem: Slow refresh failed: Cannot open volume: /vmfs/volumes/66######-########-####-########9670

[[email protected]:/vmfs/volumes] ls -l
ls: ./63######-########-####-########4f60: Read-only file system
ls: ./66######-########-####-########9670: Read-only file system
total 59904
drwxr-xr-x    1 root     root             MM DD HR:MIN 18######-########-####-########96a3
drwxr-xr-x    1 root     root             MM DD HR:MIN 5b######-########-####-########76a9
:
:
lrwxr-xr-x    1 root     root             MM DD HR:MIN Example-Datastore -> 5f######-########-####-########de3e
lrwxr-xr-x    1 root     root             MM DD HR:MIN OPManager-Datastore -> 65######-########-####-########cd50
lrwxr-xr-x    1 root     root             MM DD HR:MIN OSDATA-68######-########-####-########24e0 -> 68######-########-####-########24e0
lrwxr-xr-x    1 root     root             MM DD HR:MIN TEST1-DS -> 63######-########-####-########4f60     =============>>>>>>>>>>  Inaccessiable devices (broken symbolic link)
lrwxr-xr-x    1 root     root             MM DD HR:MIN TEST2-DS2 -> 66######-########-####-########9670    =============>>>>>>>>>>  Inaccessiable devices (broken symbolic link)
lrwxr-xr-x    1 root     root             MM DD HR:MIN TEST2-DS3 -> 5f######-########-####-########de3e

Environment

VMware vCenter Server 8.x
VMware vCenter 9.0.0

Cause

The datastores previously designated for HA heartbeating have become inaccessible. This typically occurs during infrastructure decommissioning or storage maintenance where datastores are unmounted or removed from the environment without being deselected in the HA configuration.

Resolution

1) Disable HA on the cluster.
2) Unmount the datastore in question.
3) Rescan the datastore from cluster level.
4) Enable HA.