vSAN Cluster Partition After Manual ESXi Service Restarts Following Certificate Renewal
search cancel

vSAN Cluster Partition After Manual ESXi Service Restarts Following Certificate Renewal

book

Article ID: 434152

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

In vSAN 8.0.x environments, a cluster partition may occur after renewing or changing an ESXi host certificate. This issue is triggered when administrators attempt to apply the new certificate by manually restarting individual host services (e.g., hostd, vpxa) rather than performing a full host reboot.

Symptoms include:

  • ESXi host showing as "Isolated" or "Partitioned" in the vSAN health skyline.

  • Loss of network connectivity between specific nodes in the vSAN cluster.

  • SSL trust errors in the vSAN management layer.

Environment

 

Product: VMware vSAN

Version: 8.0.x

Component: Certificate Management / Cluster Membership

 

 

Cause

Manually restarting individual services like hostd or vpxa after a certificate change is insufficient for vSAN. This action causes the vSAN service thumbprint to become inconsistent or invalid across the cluster nodes. Because the new thumbprint is not correctly propagated and synchronized to the vSAN kernel modules on all hosts, the security handshake fails, preventing the host from joining the vSAN cluster.

Log Evidence (shell.log): The following logs indicate manual service intervention instead of a reboot:

<REDACTED_TIMESTAMP> In(14) shell[11005002]: [root]: /sbin/auto-backup.sh
<REDACTED_TIMESTAMP> In(14) shell[11005002]: [root]: /etc/init.d/hostd restart
<REDACTED_TIMESTAMP> In(14) shell[11005002]: [root]: /etc/init.d/vpxa restart

SSL Exception Signature: Evidence of the thumbprint mismatch can be found in the vSAN management logs:

--> reason = "SSL Exception: Verification parameters:
--> PeerThumbprint: <REDACTED_SECRETS>:F9
--> ExpectedThumbprint: <REDACTED_SECRETS>:C3
--> ExpectedPeerName: localhost.localdomain

 

 

Resolution

To resolve the partition and ensure the new certificate is properly recognized by the vSAN cluster:

  1. Place the affected ESXi host into Maintenance Mode with the "No data migration" data evacuation option as this node is already partitioned.

  2. Perform a full Restart (Reboot) of the ESXi host.

  3. Once the host is back online, verify that it has successfully rejoined the vSAN cluster and that the "vSAN Cluster Partition" alarm has cleared.

  4. Exit Maintenance Mode.

Note: Always perform a full host reboot after certificate renewal in vSAN environments to ensure the thumbprint is correctly updated in the cluster registry.

Additional Information

https://knowledge.broadcom.com/external/article?articleNumber=317244