VMs running on vSAN became hung and unresponsive during a network change on the host.
Namespace heartbeat loss reported on one vSAN node in the cluster.
VMware vSAN
In the vmkerenl.log of the vSAN Master node we can see one host leaves the cluster.
vSAN How to find the CMMDS and Stats Master Cluster Master ESXi Host name
ESXi : /var/log/vmkernel.log####-##-##T##:##:##.###Z In(182) vmkernel: cpu0:2099092)CMMDS: CMMDSUtil_PrintArenaEntry:98: #######-####-####-####-###########: [456968514]:Adding a new Membership entry (#######-####-####-####-###########) with 7 members:####-##-##T##:##:##.###Z In(182) vmkernel: cpu3:2099092)CMMDS: CMMDSUtil_PrintArenaEntry:98: #######-####-####-####-###########: [457009585]:Adding a new Membership entry (#######-####-####-####-###########) with 8 members:
To check and confirm which host is no longer part of the cluster check the vsansystem.log
In the below, vSAN-Node01.example.com is not listed as part of the cluster.
ESXi: /var/log/vsansystem.log###-##-##T##:##:##.###Z In(166) vsansystem[2102038] [vSAN@6876 sub=VsanSystemProvider opId=CMMDSMembershipUpdate-e135] Membership update triggered (status: Success)####-##-##T##:##:##.###Z In(166) vsansystem[2102018] [vSAN@6876 sub=Libs opId=vsan-PC-6349b72b8477d-T1-W367275-e134] info [ConfigStore:##########] [cs:#:#########]BeginTransaction invoked.####-##-##T##:##:##.###Z In(166) vsansystem[2102018] [vSAN@6876 sub=Libs opId=vsan-PC-6349b72b8477d-T1-W367275-e134] info [ConfigStore::##########]] [cs:#:#########]Transaction started, level = 1####-##-##T##:##:##.###Z Wa(164) vsansystem[2102014] [vSAN@6876 sub=VsanSystemProvider opId=CMMDSMembershipUpdate-e135] Unable to retrieve witness host info from CMMDS. NODE entry may not pop up completely if it is a stretched cluster####-##-##T##:##:##.###Z In(166) vsansystem[2102018] [vSAN@6876 sub=Libs opId=vsan-PC-6349b72b8477d-T1-W367275-e134] info [ConfigStore::##########]] Checking for empty objects and arrays in comp vsan grp system key vsan object####-##-##T##:##:##.###Z In(166) vsansystem[2102014] [vSAN@6876 sub=VsanSystemProvider opId=CMMDSMembershipUpdate-e135] Complete, nodeCount: 7, runtime info: (vim.vsan.host.VsanRuntimeInfo) {####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> membershipList = (vim.vsan.host.MembershipInfo) [####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> (vim.vsan.host.MembershipInfo) {####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> nodeUuid = "#######-####-####-####-###########)",####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> hostname = "vSAN-Node05.example.com"####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> },####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> (vim.vsan.host.MembershipInfo) {####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> nodeUuid = "#######-####-####-####-###########)",####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> hostname = "vSAN-Node07.example.com"####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> },####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> (vim.vsan.host.MembershipInfo) {####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> nodeUuid = "#######-####-####-####-###########)",####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> hostname = "vSAN-Node06.example.com"####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> },####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> (vim.vsan.host.MembershipInfo) {####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> nodeUuid = "#######-####-####-####-###########)",####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> hostname = "vSAN-Node03.example.com"####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> },####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> (vim.vsan.host.MembershipInfo) {####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> nodeUuid = "#######-####-####-####-###########)",####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> hostname = "vSAN-Node02.example.com"####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> },####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> (vim.vsan.host.MembershipInfo) {####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> nodeUuid = "#######-####-####-####-###########)",####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> hostname = "vSAN-Node04.example.com"####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> },####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> (vim.vsan.host.MembershipInfo) {####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> nodeUuid = "#######-####-####-####-###########)",####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> hostname = "vSAN-Node08.example.com"####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> }####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> ],####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> diskIssues = <unset>,####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> accessGenNo = <unset>####-##-##T##:##:##.###Z In(166) vsansystem[2101974] --> }####-##-##T##:##:##.###Z In(166) vsansystem[2102038] [vSAN@6876 sub=Libs] clomdb-CdbHandleRemoveEntry: Removing #######-####-####-####-########### of type CdbObjectNode from CLOMDB.
Confirm which kernel port is tagged for vSAN traffic on vSAN-Node01.example.com
esxcli vsan network list
Interface: VmkNic Name: vmk2 IP Protocol: IP Interface UUID: #######-####-####-####-########### Agent Group Multicast Address: ###.#.#. Agent Group IPv6 Multicast Address: ####::#:#:# Agent Group Multicast Port: 23451 Master Group Multicast Address: ###.#.#.# Master Group IPv6 Multicast Address: ####::#:#:# Master Group Multicast Port: 12345 Host Unicast Channel Bound Port: 12321 Data-in-Transit Encryption Key Exchange Port: 0 Multicast TTL: 5 Traffic Type: vsan
During the network changes the vmkernel port was inadvertently removed
ESXi: /var/log/vobd.log####-##-##T##:##:##.###Z In(14) vobd[2097813] [vSANCorrelator] #############us: [vob.vsan.net.no.connectivity] vSAN is no longer using vmknic vmk2. There are no vSAN vmknics remaining.####-##-##T##:##:##.###Z In(14) vobd[2097813] [vSANCorrelator] #############us: [esx.audit.vsan.net.vnic.deleted] vSAN vnic deleted####-##-##T##:##:##.###Z In(14) vobd[2097813] [vSANCorrelator] #############us: [esx.problem.vsan.no.network.connectivity] vSAN doesn't have any network configuration. This can severly impact several objects in the vSAN datastore.####-##-##T##:##:##.###Z In(14) vobd[2097813] An event (esx.problem.vsan.no.network.connectivity) could not be sent immediately to hostd; queueing for retry.####-##-##T##:##:##.###Z In(14) vobd[2097813] [vmfsCorrelator] #############us: [vob.vmfs.heartbeat.timedout] #######-####-####-####-########### #######-####-####-####-###############-##-##T##:##:##.###Z In(14) vobd[2097813] [vmfsCorrelator] #############us: [esx.problem.vmfs.heartbeat.timedout] #######-####-####-####-########### #######-####-####-####-###############-##-##T##:##:##.###Z In(14) vobd[2097813] [vmfsCorrelator] #############us: [vob.vmfs.heartbeat.timedout] #######-####-####-####-########### #######-####-####-####-###############-##-##T##:##:##.###Z In(14) vobd[2097813] [vmfsCorrelator] #############us: [esx.problem.vmfs.heartbeat.timedout] #######-####-####-####-########### #######-####-####-####-###########...####-##-##T##:##:##.###Z In(14) vobd[2097813] [vmfsCorrelator] #############us: [vob.vmfs.heartbeat.timedout] #######-####-####-####-########### #######-####-####-####-###############-##-##T##:##:##.###Z In(14) vobd[2097813] [vmfsCorrelator] #############us: [esx.problem.vmfs.heartbeat.timedout] #######-####-####-####-########### #######-####-####-####-###############-##-##T##:##:##.###Z In(14) vobd[2097813] [vmfsCorrelator] #############us: [vob.vmfs.heartbeat.timedout] #######-####-####-####-########### #######-####-####-####-###############-##-##T##:##:##.###Z In(14) vobd[2097813] [vmfsCorrelator] #############us: [esx.problem.vmfs.heartbeat.timedout] #######-####-####-####-########### #######-####-####-####-###############-##-##T##:##:##.###Z In(14) vobd[2097813] [vSANCorrelator] #############us: [vob.vsan.net.created] vSAN is now using vmknic vmk4.####-##-##T##:##:##.###Z In(14) vobd[2097813] [vSANCorrelator] #############us: [esx.clear.vsan.network.available] vSAN now has usable network configuration. Earlier reported connectivity problems, if any, can now be ignored as it is resolved.####-##-##T##:##:##.###Z In(14) vobd[2097813] [vSANCorrelator] #############us: [esx.audit.vsan.net.vnic.added] vSAN vnic added####-##-##T##:##:##.###Z In(14) vobd[2097813] Successfully sent event (esx.problem.vsan.no.network.connectivity) after 1 failure.####-##-##T##:##:##.###Z In(14) vobd[2097813] [vSANCorrelator] #############us: [vob.vsan.net.created] vSAN is now using vmknic vmk2.####-##-##T##:##:##.###Z In(14) vobd[2097813] [vSANCorrelator] #############us: [esx.audit.vsan.net.vnic.added] vSAN vnic added####-##-##T##:##:##.###Z In(14) vobd[2097813] [vSANCorrelator] #############us: [vob.vsan.net.redundancy.reduced] vSAN is no longer using vmknic vmk4.####-##-##T##:##:##.###Z In(14) vobd[2097813] [vSANCorrelator] #############us: [esx.problem.vsan.net.redundancy.lost] vSAN network configuration doesn't have any redundancy. This may be a problem if further network configuration is removed.####-##-##T##:##:##.###Z In(14) vobd[2097813] [vSANCorrelator] #############us: [esx.audit.vsan.net.vnic.deleted] vSAN vnic deleted####-##-##T##:##:##.###Z In(14) vobd[2097813] [vmfsCorrelator] #############us: [vob.vmfs.heartbeat.recovered] Reclaimed heartbeat for volume #######-####-####-####-########### (#######-####-####-####-###########): [Timeout] [HB state abcdef02 offset 3522560 gen 547 stampUS 5695735223737 uuid 67d93556-d75697c2-cd62-4c526217405c jrnl <FB 1298576> drv 14.81]####-##-##T##:##:##.###Z In(14) vobd[2097813] [vmfsCorrelator] #############us: [esx.problem.vmfs.heartbeat.recovered] #######-####-####-####-########### #######-####-####-####-###########
During the network changes the vmkernel port tagged for vSAN traffic was inadvertently removed and the vms on the host became hung.