Network partition caused by an invalid/incomplete unicast agent list on vSAN host(s)
search cancel

Network partition caused by an invalid/incomplete unicast agent list on vSAN host(s)

book

Article ID: 317830

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Network partition among vSAN host(s).

vSAN cluster partition Network alarm in Skyline health.

Impact/Risks:
***Important*** 

Run this command on all hosts before making any changes to the unicast agent list
esxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListupdates

Once the unicast agent list has been fixed on all hosts run the below command on all hosts to set IgnoreClusterMemberListupdates back to its default setting of 0
esxcfg-advcfg -s 0 /VSAN/IgnoreClusterMemberListupdates

 



Environment

VMware vSAN 6.x
VMware vSAN 7.x
VMware vSAN 8.x

Cause

This happens because the unicast agent list is invalid or incomplete, and one or more hosts cannot communicate with other vSAN hosts in the cluster.

In the following output of a cluster of 4, one host is missing from the cluster:
[root@esxi-04:~] esxcli vsan cluster get
Cluster Information
   Enabled: true
   Current Local Time: 2021-03-30T13:40:44Z
   Local Node UUID: 602583eb-233c-b69a-8291-
   Local Node Type: NORMAL
   Local Node State: MASTER
   Local Node Health State: HEALTHY
   Sub-Cluster Master UUID: 602583eb-233c-b69a-8291-############
   Sub-Cluster Backup UUID: 602572bd-2ef4-8f69-d8ce-############
   Sub-Cluster UUID: 52cd69c8-e409-363f-bd75-############
   Sub-Cluster Membership Entry Revision: 4
   Sub-Cluster Member Count: 3
   Sub-Cluster Member UUIDs: 602572bd-2ef4-8f69-d8ce-############, 602583eb-233c-b69a-8291-############, 60198995-b367-2922-############
   Sub-Cluster Member HostNames: esxi-02, esxi-04, esxi-03
   Sub-Cluster Membership UUID: f4266360-e165-0b0b-############
   Unicast Mode Enabled: true
   Maintenance Mode State: OFF
   Config Generation: 81f5b3c2-fe55-4a00-9eb5-############ 20 2021-03-30T13:26:12.0

In the unicastagent list of each host all nodes part of the vSAN cluster must be present except the host which is been logged in. In the same scenario since it is a cluster of 4 nodes the unicastagent list must have three inputs, which confirmed that one host is missing:

[root@esxi-04:~] esxcli vsan cluster unicastagent list
NodeUuid                              IsWitness  Supports Unicast  IP Address      Port  Iface Name  Cert Thumbprint                                              SubClusterUuid
------------------------------------  ---------  ----------------  -------------  -----  ----------  -----------------------------------------------------------  --------------
602572bd-2ef4-8f69-d8ce-############          0              true  192.168.10.13  12321              
60198995-b367-2922-8fbf-############          0              true  192.168.10.14  12321              

A possible cause for this is if IgnoreClusterMemberListupdates is set to a value of 1 on one or more hosts in the cluster.

A value of 1 tells the host to ignore any updates coming from vCenter regarding the unicast agent list.
A value of 0, which is the default setting, tells the host to accept the changes coming from vCenter.

To check the current setting run the following command:
esxcfg-advcfg -g /VSAN/IgnoreClusterMemberListUpdates

Resolution

1) Open an SSH session to all the nodes in the vSAN cluster and using the command esxcli vsan cluster unicastagent list verify which hosts have an incomplete unicast list.

[root@esxi-01:~] esxcli vsan cluster unicastagent list
NodeUuid                              IsWitness  Supports Unicast  IP Address      Port  Iface Name  Cert Thumbprint                                              SubClusterUuid
------------------------------------  ---------  ----------------  -------------  -----  ----------  -----------------------------------------------------------  --------------
602572bd-2ef4-8f69-d8ce-############          0              true  192.168.10.13  12321              
60198995-b367-2922-8fbf-############          0              true  192.168.10.14  12321              
602583eb-233c-b69a-8291-############          0              true  192.168.10.12  12321              


[root@esxi-02:~] esxcli vsan cluster unicastagent list
NodeUuid                              IsWitness  Supports Unicast  IP Address      Port  Iface Name  Cert Thumbprint                                              SubClusterUuid
------------------------------------  ---------  ----------------  -------------  -----  ----------  -----------------------------------------------------------  --------------
60257046-5d95-a750-7135-############          0              true  192.168.10.11  12321              
60198995-b367-2922-8fbf-############          0              true  192.168.10.14  12321              
602583eb-233c-b69a-8291-############          0              true  192.168.10.12  12321              

[root@esxi-03:~] esxcli vsan cluster unicastagent list
NodeUuid                              IsWitness  Supports Unicast  IP Address      Port  Iface Name  Cert Thumbprint                                              SubClusterUuid
------------------------------------  ---------  ----------------  -------------  -----  ----------  -----------------------------------------------------------  --------------
602572bd-2ef4-8f69-d8ce-############          0              true  192.168.10.13  12321              
60257046-5d95-a750-7135-############          0              true  192.168.10.11  12321              
602583eb-233c-b69a-8291-############          0              true  192.168.10.12  12321              

[root@esxi-04:~] esxcli vsan cluster unicastagent list
NodeUuid                              IsWitness  Supports Unicast  IP Address      Port  Iface Name  Cert Thumbprint                                              SubClusterUuid
------------------------------------  ---------  ----------------  -------------  -----  ----------  -----------------------------------------------------------  --------------
602572bd-2ef4-8f69-d8ce-############          0              true  192.168.10.13  12321              
60198995-b367-2922-8fbf-############          0              true  192.168.10.14  12321              


2) Once identified which hosts have incomplete/invalid unicastagent list, find the UUID and vSAN IP address of the missing/invalid hosts:
"In this case esxi-04 is missing 1 host (esxi-01)"

Go to the missing host and get the UUID:
[root@esxi-01:~] cmmds-tool whoami
60257046-5d95-a750-7135-############


Find the vSAN vmk IP address:
"Here vmk3 is used for vSAN"
[root@esxi-01:~] esxcfg-vmknic -l
Interface  Port Group/DVPort/Opaque Network        IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type                NetStack 
-------------------output shrinked------------------------
vmk2       vmotion                                 IPv6      fe80::250:56ff:fe6e:1418                64                              00:50:56:6e:14:18 1500    65535     true    STATIC, PREFERRED   defaultTcpipStack
vmk3       vsan                                    IPv4      192.168.10.11                           255.255.255.0   192.168.10.255  00:50:56:6e:6b:df 1500    65535     true    STATIC              defaultTcpipStack
vmk3       vsan                                    IPv6      fe80::250:56ff:fe6e:6bdf                64                              00:50:56:6e:6b:df 1500    65535     true    STATIC, PREFERRED   defaultTcpipStack



3) Add the entry to the unicast agent list:

Syntax: esxcli vsan cluster unicastagent add -t node -u <Host_UUID> -U true -a <Host_VSAN_IP> -p 12321

[root@esxi-04:~] esxcli vsan cluster unicastagent add -t node -u 60257046-5d95-a750-7135-############ -U true -a 192.168.10.11 -p 12321

4) Verify that the cluster is complete:

[root@esxi-04:~] esxcli vsan cluster get
Cluster Information
   Enabled: true
   Current Local Time: 2021-03-30T14:21:55Z
   Local Node UUID: 602583eb-233c-b69a-8291-############
   Local Node Type: NORMAL
   Local Node State: AGENT
   Local Node Health State: HEALTHY
   Sub-Cluster Master UUID: 60257046-5d95-a750-7135-############
   Sub-Cluster Backup UUID: 60198995-b367-2922-8fbf-############
   Sub-Cluster UUID: 52cd69c8-e409-363f-bd75-############
   Sub-Cluster Membership Entry Revision: 5
   Sub-Cluster Member Count: 4
   Sub-Cluster Member UUIDs: 60257046-5d95-a750-7135-############, 60198995-b367-2922-8fbf-############, 602572bd-2ef4-8f69-d8ce-############, 602583eb-233c-b69a-8291-############
   Sub-Cluster Member HostNames: esxi-01, esxi-03, esxi-02, esxi-04
   Sub-Cluster Membership UUID: f5a46060-4df7-160b-4fdc-############
   Unicast Mode Enabled: true
   Maintenance Mode State: OFF
   Config Generation: 81f5b3c2-fe55-4a00-9eb5-############ 20 2021-03-30T14:21:15.294



Workaround:
Alternative methods for fixing the uncast agent list
# example: host 9 lost entries
# ignore updates on every esxi:
esxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListupdates
# get entries from another host/s
esxcli vsan cluster unicastagent list
# example output:
------------------------------------  ---------  ----------------  ----------  -----  ----------
58c7ebe0-e608-9fd4-0ccc-############          0              true  10.20.3.6   12321
552555f9-cc64-7d88-2b3d-############          0              true  10.20.3.7   12321
552558c2-ba81-5960-7a38-############          0              true  10.20.3.8   12321
55255365-dadf-992f-1f7d-############          0              true  10.20.3.9   12321
# on host 9 set the missing ones:
esxcli vsan cluster unicastagent add -i vmk2 -t node -U 1 -a 10.20.3.6 -u 58c7ebe0-e608-9fd4-0ccc-############
esxcli vsan cluster unicastagent add -i vmk2 -t node -U 1 -a 10.20.3.7 -u 552555f9-cc64-7d88-2b3d-############
esxcli vsan cluster unicastagent add -i vmk2 -t node -U 1 -a 10.20.3.8 -u 552558c2-ba81-5960-7a38-############

Run the below script on the host that is missing from the respected unicast agent list to build the command to be run on the host with the missing entry. NODE_UUID=$(esxcli vsan cluster get | grep -E "Local Node UUID" | awk '{print $4}');VSAN_VMK=$(esxcli vsan network list | grep VmkNic| awk '{print $3}');NODE_IP=$(esxcli network ip interface ipv4 get -i $VSAN_VMK | grep vmk | awk '{print $2}');echo "esxcli vsan cluster unicastagent add -t node -u $NODE_UUID -U true -a $NODE_IP -p 12321 -i $VSAN_VMK"