vSAN Health Check fails with message "Invalid Unicast List" and Rebooting vCenter Results in Incorrect Unicast Agent List Causing a Cluster Partition
search cancel

vSAN Health Check fails with message "Invalid Unicast List" and Rebooting vCenter Results in Incorrect Unicast Agent List Causing a Cluster Partition

book

Article ID: 326460

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:
The alert "Invalid unicast list" is triggered in vSAN health check even though the cluster is is formed.
If the vCenter is rebooted, unicast agent list of data nodes is altered by vCenter .

Cause

This is caused by vCenter holding on to old witness IP address(s). When rebooting vCenter, the vCenter pushes this old IP address to all of the nodes in the cluster resulting in a partition.

Resolution

This can be fixed by re-configuring the the stretched cluster. Note that objects will be in reduced availability during the process.
  1. Disable IgnoreClusterMemberListUpdates on all data nodes
    • Command to do that is "esxcfg-advice -s 0 /VSAN/IgnoreClusterMemberListUpdates"
  2. Put witness into maintenance mode
  3. Delete its diskgroups
  4. Go to fault domains and disable the stretched cluster
  5. SSH to witness and make sure it is not part of a cluster
    • Use "esxcli vsan cluster get" to check if it is part of the cluster
    • Use "esxcli vsan cluster leave" to leave the cluster if it is still part of one
  6. Go to fault domains and re-configure the stretched cluster


Additional Information

Impact/Risks:
If witness previously used a different IP for vSAN/witness traffic, rebooting the vCenter causes vCenter to push the old witness IP to the unicast agent list of the data nodes.

If resolution below cannot be implemented, enable IgnoreClusterMemberListUpdates on data nodes after cluster is formed to prevent vCenter from overwriting the unicast list:
  • esxcfg-advice -s 1 /VSAN/IgnoreClusterMemberListUpdates