vSAN cluster partition and duplicate witness appliance entries in the vSphere Skyline Health UI, following an abrupt reboot of the data nodes
search cancel

vSAN cluster partition and duplicate witness appliance entries in the vSphere Skyline Health UI, following an abrupt reboot of the data nodes

book

Article ID: 431167

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

After an abrupt reboot of vSAN data nodes, the vSAN cluster experiences a partition, and the witness appliance appears disconnected or is duplicated in the vSphere Skyline Health UI.


The vmkernel.log on the affected data node displays the following errors during boot:


2026-02-24T04:32:11.629Z In(182) vmkernel: cpu46:2097655)RDT:RDT_GetNicAddressInfo:2282: Failed to get any IP address for vmk0

Additionally, checking active network connections (esxcli network ip connection list or localcli network ip connection list) reveals no active listener on TCP port 2233 for the witness VMK interface IP address.

# localcli network ip connection list| grep 2233 | grep LISTEN
Proto  Recv Q  Send Q  Local Address          Foreign Address       State        World ID  CC Algo  World Name
-----  ------  ------  ---------------------  --------------------  -----------  --------  -------  ----------

tcp         0       0  1##.2#.##.###:2233                0.0.0.0:0 LISTEN        2097655  newreno  ---> vSAN VMK IP address
tcp         0       0  [::1]:2233                                     [::]:0    LISTEN        2097655  newreno  ----> witness VMK IP address (missing ip)
tcp         0       0  127.0.0.1:2233                        0.0.0.0:0 LISTEN        2097655  newreno  

Environment

VMware vSAN 8.0U3

Cause

The vSAN Reliable Datagram Transport (RDT) service attempts to initialize before the VMkernel adapter designated for witness traffic (e.g., vmk0) has successfully acquired its IP address. Consequently, RDT fails to bind to TCP port 2233. Without this listening port, the witness appliance receives heartbeats but cannot establish a complete connection back to the leader node, resulting in a cluster partition.

Resolution

To resolve this issue, retag the vSAN witness traffic on the VMkernel interfaces for both the leader and backup data nodes. This action forces the vSAN network services to re-evaluate the interface, allowing RDT to successfully bind to TCP port 2233 now that the interface has an active IP address.

Option 1: Retag via vSphere Client UI

  1. Log in to the vSphere Client and navigate to the affected ESXi data node.

  2. Go to Configure > Networking > VMkernel adapters.

  3. Select the VMkernel adapter designated for witness traffic (e.g., vmk0) and click Edit.

  4. In the port properties, uncheck the vSAN witness service tag and click OK.

  5. Select the same VMkernel adapter and click Edit again.

  6. Re-check the vSAN witness service tag and click OK.

  7. Repeat these steps for the remaining data nodes.

Option 2: Retag via ESXi CLI

  1. SSH into the affected ESXi data node as root.

  2. Remove the witness tag from the appropriate interface (replace vmk0 with the correct identifier for your environment if different): esxcli vsan network ipv4 remove -i vmk0

  3. Re-add the witness tag to the interface: esxcli vsan network ipv4 add -i vmk0 -T witness

  4. Verify the interface is correctly tagged and active for vSAN: esxcli vsan network list

  5. Repeat these steps for the remaining data nodes.