VM powering on fails with the following error
Module MonitorLoop power on failed.Failed to start the virtual machine.Failed to power on VM. Could not power on virtual machine: Failure.Failed to create swap file /vmfs/volumes/vsan:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:.vswp: Failure
VMware vSAN 7.X
VMware vSAN 8.X
VMware vSAN 9.X
There are numerous 'RPC to DOM op create returned: No connection' error messages in the var/log/vmkernel logs.
vmkwarning: cpu17: ##########:)WARNING: VSAN: VsanIoctlCtrlNodeCommon:3138: #######-####-####-####-#########: RPC to DOM op create returned: Socket is not connected.
These errors indicate an underlying network issue, from this specific host's perspective, and its inability to communicate with other nodes.
Network investigation needs to be done on the host where VMs are not powering on or created, these will need to be checked below.
/var/log/vmkernel.logCMMDS: LeaderRemoveNodeFromMembership:####: ########-####-####-####-#######: Removing node #######-####-####-####-####### (vsanNodeType: data) from the cluster membershipCMMDS: CMMDSUtil_PrintArenaEntry:98: ########-####-####-####-#######: [1341898]: Adding a new Membership entry (######-#####-####-####-######) with 8 members:
In vobd.log, check for heartbeat timeouts, as this indicates that heartbeat timeouts are occurring between the nodes due to network slowness.
[esx.problem.vmfs.heartbeat.recovered]:####: ######-####-#####-####-###### :####: #######-####-####-####-######## [vob.vmfs.heartbeat.timedout]:####: #######-####-####-####-#######: ####: ######-####-####-####-########
To check for "wait for RDT, which is slowness on the network as vSAN protocol is awaiting updates on the network from other nodes, check the vsantracesUrgent log for these messages below.
2025-03-10T00:16:45.439638 [3612038] [cpu22] [c826378b OWNER readWithBlkAttr5 VMDISK] DOMTraceOpTookTooLong:10304: {'op': 0x45bad8d63c80, 'objUuid': '######-####-####-####-########', 'offset-39': 138322706432, 'length-25': 65536, 'totalTimeMS': 10014, 'timeInThisPhaseMS': 10014, 'opPhase': 'Wait for RDT'