vSAN Stretched Cluster objects show Reduced Availability after vCenter Certificate Renewal
search cancel

vSAN Stretched Cluster objects show Reduced Availability after vCenter Certificate Renewal

book

Article ID: 440757

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Following a vCenter certificate renewal or a global state refresh, you observe the following symptoms in your vSAN stretched cluster:

  • vSAN objects are in a Reduced availability with no rebuild state.
  • vSAN Skyline Health reports Witness host already in cluster or Witness host not found errors.
  • The vSphere Client UI blocks any modifications or remediation attempts to the stretched cluster configuration.
  • ESXi host logs (/var/run/log/vmkernel.log) show rapid unicast agent changes:
    • CMMDSNet_RemoveUnicastHost ... Remove unicast witness node:####.####.####.2
    • CMMDSNetAddUnicastHostInt ... Add unicast witness node: ####.####.####.3

Environment

vCenter 8.x
ESXi 8.x
vSAN 8.x 
vSAN Stretched Cluster

Cause

This issue occurs when a duplicate internal host UUID is shared between two separate vSAN witness appliances. This typically happens if a witness appliance was cloned or deployed from a virtual machine template instead of being deployed as a fresh OVF.

During a vCenter certificate renewal, a host reconnection and cluster state rediscovery are forced. If two distinct appliances share the same UUID, vCenter may incorrectly merge both IP addresses into the active witness role metadata. This creates a logic loop in the unicast agents on the ESXi data hosts as they attempt to route to the same UUID via conflicting IP addresses (e.g., ####.####.####.2 and ####.####.####.3). This conflict breaks communication between the data nodes and the witness.

Resolution

To resolve this issue and restore cluster stability, you must override the corrupted configuration using the command line.

Step 1: Remove Fault Domains

In the vSphere Client, temporarily remove the ESXi data hosts from their designated stretched cluster fault domains to isolate them from the conflicting configuration.

Step 2: Purge Corrupted Witness Data

  1. Log into the Ruby vSphere Console (RVC).
  2. Navigate to the affected cluster path:
    cd /localhost/####-DC-####-####/computers/####-cl-####-####
  3. Execute the witness removal command:

This command cannot be undone. Verify every parameter before running. WARNING: Removing the witness component breaks stretched cluster site quorum and will impact data availability if the primary and secondary sites are not fully synchronized. Verify all parameters before execution. If you are uncertain about the potential impact, consult a technical lead.

vsan.stretchedcluster.remove_witness .

  1. Ensure the command returns Task result: success.

Step 3: Rebuild the Stretched Cluster

Reconfigure the stretched cluster and fault domains through the vSphere Client UI, explicitly pointing to the correct, original witness IP address (e.g., ####.####.####.2).

Step 4: Preventative Measures

  • Replace the Duplicate Witness: Place the appliance with the duplicate UUID into maintenance mode (using Ensure Data Availability), power it off, and deploy a completely fresh witness appliance using the official OVF template.
  • Network Isolation: Ensure different vSAN clusters and their respective witness communication paths are isolated on separate subnets or VLANs to prevent identity broadcasting conflicts.

Additional Information