Network partition among vSAN host(s).
vSAN cluster partition Network alarm in Skyline health.
Impact/Risks:
***Important***
Run this command on all hosts before making any changes to the unicast agent listesxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListupdates
Once the unicast agent list has been fixed on all hosts run the below command on all hosts to set IgnoreClusterMemberListupdates
back to its default setting of 0esxcfg-advcfg -s 0 /VSAN/IgnoreClusterMemberListupdates
IgnoreClusterMemberListupdates
is set to a value of 1 on one or more hosts in the cluster.esxcfg-advcfg -g /VSAN/IgnoreClusterMemberListUpdates
1) Open an SSH session to all the nodes in the vSAN cluster and using the command esxcli vsan cluster unicastagent list verify which hosts have an incomplete unicast list.
[root@esxi-01:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint SubClusterUuid
------------------------------------ --------- ---------------- ------------- ----- ---------- ----------------------------------------------------------- --------------
602572bd-2ef4-8f69-d8ce-############ 0 true 192.168.10.13 12321
60198995-b367-2922-8fbf-############ 0 true 192.168.10.14 12321
602583eb-233c-b69a-8291-############ 0 true 192.168.10.12 12321
[root@esxi-02:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint SubClusterUuid
------------------------------------ --------- ---------------- ------------- ----- ---------- ----------------------------------------------------------- --------------
60257046-5d95-a750-7135-############ 0 true 192.168.10.11 12321
60198995-b367-2922-8fbf-############ 0 true 192.168.10.14 12321
602583eb-233c-b69a-8291-############ 0 true 192.168.10.12 12321
[root@esxi-03:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint SubClusterUuid
------------------------------------ --------- ---------------- ------------- ----- ---------- ----------------------------------------------------------- --------------
602572bd-2ef4-8f69-d8ce-############ 0 true 192.168.10.13 12321
60257046-5d95-a750-7135-############ 0 true 192.168.10.11 12321
602583eb-233c-b69a-8291-############ 0 true 192.168.10.12 12321
[root@esxi-04:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name Cert Thumbprint SubClusterUuid
------------------------------------ --------- ---------------- ------------- ----- ---------- ----------------------------------------------------------- --------------
602572bd-2ef4-8f69-d8ce-############ 0 true 192.168.10.13 12321
60198995-b367-2922-8fbf-############ 0 true 192.168.10.14 12321
2) Once identified which hosts have incomplete/invalid unicastagent list, find the UUID and vSAN IP address of the missing/invalid hosts:
"In this case esxi-04 is missing 1 host (esxi-01)"
Go to the missing host and get the UUID:
[root@esxi-01:~] cmmds-tool whoami
60257046-5d95-a750-7135-############
Find the vSAN vmk IP address:
"Here vmk3 is used for vSAN"
[root@esxi-01:~] esxcfg-vmknic -l
Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
-------------------output shrinked------------------------
vmk2 vmotion IPv6 fe80::250:56ff:fe6e:1418 64 00:50:56:6e:14:18 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk3 vsan IPv4 192.168.10.11 255.255.255.0 192.168.10.255 00:50:56:6e:6b:df 1500 65535 true STATIC defaultTcpipStack
vmk3 vsan IPv6 fe80::250:56ff:fe6e:6bdf 64 00:50:56:6e:6b:df 1500 65535 true STATIC, PREFERRED defaultTcpipStack
3) Add the entry to the unicast agent list:
Syntax: esxcli vsan cluster unicastagent add -t node -u <Host_UUID> -U true -a <Host_VSAN_IP> -p 12321
[root@esxi-04:~] esxcli vsan cluster unicastagent add -t node -u 60257046-5d95-a750-7135-############ -U true -a 192.168.10.11 -p 12321
4) Verify that the cluster is complete:
[root@esxi-04:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2021-03-30T14:21:55Z
Local Node UUID: 602583eb-233c-b69a-8291-############
Local Node Type: NORMAL
Local Node State: AGENT
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 60257046-5d95-a750-7135-############
Sub-Cluster Backup UUID: 60198995-b367-2922-8fbf-############
Sub-Cluster UUID: 52cd69c8-e409-363f-bd75-############
Sub-Cluster Membership Entry Revision: 5
Sub-Cluster Member Count: 4
Sub-Cluster Member UUIDs: 60257046-5d95-a750-7135-############, 60198995-b367-2922-8fbf-############, 602572bd-2ef4-8f69-d8ce-############, 602583eb-233c-b69a-8291-############
Sub-Cluster Member HostNames: esxi-01, esxi-03, esxi-02, esxi-04
Sub-Cluster Membership UUID: f5a46060-4df7-160b-4fdc-############
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: 81f5b3c2-fe55-4a00-9eb5-############ 20 2021-03-30T14:21:15.294
Alternative methods for fixing the uncast agent list
# example: host 9 lost entries
# ignore updates on every esxi:
esxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListupdates
# get entries from another host/s
esxcli vsan cluster unicastagent list
# example output:
------------------------------------ --------- ---------------- ---------- ----- ----------
58c7ebe0-e608-9fd4-0ccc-############ 0 true 10.20.3.6 12321
552555f9-cc64-7d88-2b3d-############ 0 true 10.20.3.7 12321
552558c2-ba81-5960-7a38-############ 0 true 10.20.3.8 12321
55255365-dadf-992f-1f7d-############ 0 true 10.20.3.9 12321
# on host 9 set the missing ones:
esxcli vsan cluster unicastagent add -i vmk2 -t node -U 1 -a 10.20.3.6 -u 58c7ebe0-e608-9fd4-0ccc-############
esxcli vsan cluster unicastagent add -i vmk2 -t node -U 1 -a 10.20.3.7 -u 552555f9-cc64-7d88-2b3d-############
esxcli vsan cluster unicastagent add -i vmk2 -t node -U 1 -a 10.20.3.8 -u 552558c2-ba81-5960-7a38-############
Run the below script on the host that is missing from the respected unicast agent list to build the command to be run on the host with the missing entry.
NODE_UUID=$(esxcli vsan cluster get | grep -E "Local Node UUID" | awk '{print $4}');VSAN_VMK=$(esxcli vsan network list | grep VmkNic| awk '{print $3}');NODE_IP=$(esxcli network ip interface ipv4 get -i $VSAN_VMK | grep vmk | awk '{print $2}');echo "esxcli vsan cluster unicastagent add -t node -u $NODE_UUID -U true -a $NODE_IP -p 12321 -i $VSAN_VMK"