daemon process vitd) ran out of available threadpool resources.mmm dd hh:mm:ss vm_name kernel: [1761594.974483] connection5:0: detected conn error (1020)mmm dd hh:mm:ss vm_name kernel: [1761594.974482] connection6:0: detected conn error (1020)
A known CMMDS issue where a leader host undergoing reboot or shutdown continued transmitting leader heartbeats, even though it was no longer able to receive traffic.
Because the other cluster nodes were still receiving these outgoing heartbeats, they continued to follow the rebooting leader instead of failing over to the backup. As a result, a clean leadership transition was prevented, and a full cluster partition was triggered until the leader’s networking stack was fully stopped.
Logs Validation:
The issue can be confirmed through the following log observations:
/var/run/log/vmkernel.log)2025-08-02T01:42:55.501Z In(182) vmkernel: cpu73:2098937)CMMDS: LeaderBuildHeartbeatMessage:2120: 52cfc3d8-####-####-ebd1-#########: [318070950]:Current membership uuid 33c41967-####-####-23a9-######### has 14 members2025-08-02T01:42:55.501Z In(182) vmkernel: cpu73:2098937)CMMDS: LeaderBuildHeartbeatMessage:2131: 52cfc3d8-####-####-ebd1-#########: [318070950]:Member[0]:6718c2b8-####-####-fbf3-#########(leader)2025-08-02T01:42:55.501Z In(182) vmkernel: cpu73:2098937)CMMDS: LeaderBuildHeartbeatMessage:2126: 52cfc3d8-####-####-ebd1-#########: [318070950]:Member[1]:6719af12-####-####-ff6c-#########(backup)
/var/run/log/vobd.log)2025-08-02T02:26:35.617Z In(14) vobd[2098025]: [UserLevelCorrelator] 24423759940313us: [esx.audit.maintenancemode.entered] The host has entered maintenance mode.
/var/run/log/vmksummary.log)2025-08-02T02:26:36.568Z No(13) bootstop[245573871]: Host is rebooting
(/var/run/log/vmkernel.log)2025-08-02T02:26:54.480Z In(182) vmkernel: cpu86:2099008)CMMDSNet: CMMDSNet_SetLeader:1315: 52cfc3d8-####-####-ebd1-#########: Updating leader node: old=6718c2b8-####-####-fbf3-######### new=none2025-08-02T02:26:54.480Z In(182) vmkernel: cpu86:2099008)CMMDSNet: CMMDSNet_SetLeader:1315: 52cfc3d8-####-####-ebd1-#########: Updating leader node: old=none new=6719af12-####-####-ff6c-#########2025-08-02T02:26:54.480Z In(182) vmkernel: cpu86:2099008)CMMDS: CMMDSLogStateTransition:1824: 52cfc3d8-####-####-ebd1-#########: Transitioning(6719af12-####-####-ff6c-#########) from Backup to Leader: (Reason: Backup is taking over the cluster leader)
/var/run/log/vmkernel.log)2025-08-02T02:26:59.486Z In(182) vmkernel: cpu50:2099008)CMMDS: CMMDSLogStateTransition:1824: 52cfc3d8-####-####-ebd1-#########: Transitioning(6719b151-####-####-1fc1-#########) from Agent to Discovery: (Reason: Failed to receive from node)2025-08-02T02:27:00.064Z In(182) vmkernel: cpu57:2099008)CMMDS: CMMDSLogStateTransition:1824: 52cfc3d8-####-####-ebd1-#########: Transitioning(6719b151-####-####-1fc1-#########) from Discovery to Rejoin: (Reason: Found a leader node)2025-08-02T02:27:00.329Z In(182) vmkernel: cpu57:2099008)CMMDS: CMMDSLogStateTransition:1824: 52cfc3d8-####-####-ebd1-#########: Transitioning(6719b151-####-####-1fc1-#########) from Rejoin to Discovery: (Reason: Failed to receive from node)2025-08-02T02:27:01.814Z In(182) vmkernel: cpu57:2099008)CMMDS: CMMDSLogStateTransition:1824: 52cfc3d8-####-####-ebd1-#########: Transitioning(6719b151-####-####-1fc1-#########) from Discovery to Rejoin: (Reason: Found a leader node)2025-08-02T02:27:03.610Z In(182) vmkernel: cpu57:2099008)CMMDS: CMMDSLogStateTransition:1824: 52cfc3d8-####-####-ebd1-#########: Transitioning(6719b151-####-####-1fc1-#########) from Rejoin to Agent: (Reason: The local node has finished rejoining)
(/var/run/log/vsansystem.log)2025-08-02T02:26:54.480Z In(166) vsansystem[2532172]: [vSAN@6876 sub=VsanSystemProvider opId=CMMDSMembershipUpdate-beb4] Complete, nodeCount: 14, runtime info:(vim.vsan.host.VsanRuntimeInfo) {2025-08-02T02:26:59.480Z In(166) vsansystem[2532157]: [vSAN@6876 sub=VsanSystemProvider opId=CMMDSMembershipUpdate-bf0d] Complete, nodeCount: 13, runtime info:(vim.vsan.host.VsanRuntimeInfo) {2025-08-02T02:26:59.489Z In(166) vsansystem[2532157]: [vSAN@6876 sub=VsanSystemProvider opId=CMMDSMembershipUpdate-bf10] Complete, nodeCount: 10, runtime info:(vim.vsan.host.VsanRuntimeInfo) {2025-08-02T02:26:59.495Z In(166) vsansystem[2532157]: [vSAN@6876 sub=VsanSystemProvider opId=CMMDSMembershipUpdate-bf0d] Complete, nodeCount: 9, runtime info: (vim.vsan.host.VsanRuntimeInfo) {2025-08-02T02:26:59.504Z In(166) vsansystem[2532170]: [vSAN@6876 sub=VsanSystemProvider opId=CMMDSMembershipUpdate-bf19] Complete, nodeCount: 9, runtime info: (vim.vsan.host.VsanRuntimeInfo) {
(/var/run/log/vmkernel.log)2025-08-02T02:26:59.486Z In(182) vmkernel: cpu59:2099008)CMMDS: CMMDSStateMachineReceiveLoop:1640: 52cfc3d8-####-####-ebd1-#########: Error receiving from 6718c2b8-####-####-fbf3-#########: Failure 2025-08-02T02:26:592025-08-02T02:26:59.486Z In(182) vmkernel: cpu59:2099008)CMMDS: CMMDSStateDestroyNode:708: 52cfc3d8-####-####-ebd1-#########: Destroying node 6718c2b8-####-####-fbf3-#########: Failed to receive from node2025-08-02T02:26:59.486Z In(182) vmkernel: cpu59:2099008)CMMDS: AgentDestroyNode:1660: 52cfc3d8-####-####-ebd1-#########: Lost leader node (6718c2b8-####-####-fbf3-#########), can't handle that and will transition to discovery2025-08-02T02:26:59.486Z In(182) vmkernel: cpu59:2099008)CMMDSNet: CMMDSNet_SetLeader:1315: 52cfc3d8-####-####-ebd1-1c05 #########: Updating leader node: old=6718c2b8-####-####-fbf3-######### new=none2025-08-02T02:26:59.486Z In(182) vmkernel: cpu50:2099008)CMMDS: CMMDSLogStateTransition:1824: 52cfc3d8-####-####-ebd1-#########: Transitioning(6719b151-####-####-1fc1-#########) from Agent to Discovery: (Reason: Failed to receive from node)2025-08-02T02:26:59.486Z Wa(180) vmkwarning: cpu50:2099008)WARNING: RDT: RDTEndQueuedMessages:1390: assoc 0x43224e33e6c0 message 92888778 failure2025-08-02T02:26:59.486Z Wa(180) vmkwarning: cpu50:2099008)WARNING: RDT: RDTEndQueuedMessages:1390: assoc 0x43224e33e6c0 message 92888779 failure2025-08-02T02:26:59.486Z Wa(180) vmkwarning: cpu50:2099008)WARNING: RDT: RDTEndQueuedMessages:1390: assoc 0x43224e33e6c0 message 92888780 failure2025-08-02T02:26:59.486Z Wa(180) vmkwarning: cpu50:2099008)WARNING: RDT: RDTEndQueuedMessages:1390: assoc 0x43224e33e6c0 message 92888781 failure2025-08-02T02:27:00.329Z Wa(180) vmkwarning: cpu57:2099008)WARNING: RDT: RDTEndQueuedMessages:1390: assoc 0x43224eb755c0 message 1 failure2025-08-02T02:27:00.329Z Wa(180) vmkwarning: cpu57:2099008)WARNING: RDT: RDTEndQueuedMessages:1390: assoc 0x43224eb755c0 message 2 failure2025-08-02T02:27:00.329Z Wa(180) vmkwarning: cpu57:2099008)WARNING: RDT: RDTEndQueuedMessages:1390: assoc 0x43224eb755c0 message 3 failure2025-08-02T02:27:00.329Z Wa(180) vmkwarning: cpu57:2099008)WARNING: RDT: RDTEndQueuedMessages:1390: assoc 0x43224eb755c0 message 2 failure
(/var/run/log/vmkernel.log)2025-08-02T02:26:59.488Z In(182) vmkernel: cpu50:2099008)CMMDS: CMMDSClusterDestroyNodeImpl:262: Destroying node 6719af12-####-####-ff6c-######### from the cluster db. Last HB received from node - 24349239336243137
(/var/run/log/vmkernel.log)2025-08-02T02:26:59.481Z In(182) vmkernel: cpu13:2099082)DOM: DOMOwner_SetLivenessState:10887: Object 16043e67-####-####-1a51-######### lost liveness [0x45bb80a3f840]# vsish -e set /vmkModules/cmmds/forceTransition abdicateLeaderRemove the vSAN traffic tag from the vmknic prior to host reboot.
To untag vSAN traffic run the below command:
# esxcli network ip interface tag remove -i vmkx -t vSAN
Re-apply the vSAN tag after the host has successfully rebooted.
To re-tag after upgrade and reboot run the below command:
# esxcli network ip interface tag add -i vmkx -t vSAN