Virtual machine becomes inaccessible after the vSAN network outage
search cancel

Virtual machine becomes inaccessible after the vSAN network outage

book

Article ID: 402397

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

 Symptoms

  • Virtual machines may become inaccessible after recovering from an unplanned network outage like inter site link outage. 
  • vSAN objects will go inaccessible. 

    Run the below commands  to collect list of inaccessible objects. 

    # esxcli vsan debug object list --health=inaccessible

    Object UUID: a26fa467-e005-####-####-############
    Version: 15
    Health: inaccessible - Object is initializing or creating.(APD)
    Owner: esxi01
    Used: 21846.21 GB
    hostFailuresToTolerate: 1
    subFailuresToTolerate: 1
    CSN: 3054
    SCSN: 3181
    spbmProfileName: vsanRaid5-Stretched(Cluster01)
    locality: None
    Path: N/A
    Group UUID: a775a367-844d-####-####-############
    Directory Name: N/A
  • vCenter UI > Cluster > Monitor > vSAN > Resyncing Objects may show resync not progressing.
  • ESXi host may not enter maintenance mode with Ensure Accessibility and Full data evacuation options due to the inaccessible object.                                                     

Environment

VMware vSAN (All Versions)

Cause

 

  • This is rare case scenario where the affected object entered an APD (All Paths Down) state due to the exhaustion of available resync scheduler slots, causing resynchronization tasks to become stuck and unable to progress.

  • The environment shows frequent small and large ping test failures, indicating an unstable network or inter-switch link (ISL). This network instability can lead to repeated object rebuilds, increasing the risk of resync contention

During this issue, this will be observed in the /var/log/vsansystem log on the ESX host, indicating network instability as hosts are leaving the network.

vsansystem[2102272] [vSAN@6876 sub=VsanSystemProvider opId=CMMDSMembershipUpdate-9426] Complete, nodeCount: 13, runtime info: (vim.vsan.host.VsanRuntimeInfo) {
.
.
 info vsansystem[2102264] [vSAN@6876 sub=VsanSystemProvider opId=CMMDSNodeUpdate-973e] Complete, nodeCount: 9, runtime info: (vim.vsan.host.VsanRuntimeInfo) {
.
.
info vsansystem[2102132] [vSAN@6876 sub=VsanSystemProvider opId=CMMDSNodeUpdate-a1c8] Complete, nodeCount: 13, runtime info: (vim.vsan.host.VsanRuntimeInfo) {

In the vmkernel log, membership drops are observed below during the time of the incident, indicating network isolation again.


vmkernel: cpu77:2099751)CMMDS: LeaderBuildHeartbeatMessage:2430: XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX: [40131563]:Current membership uuid  XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX has 1 members                                                                                         
vmkernel: cpu77:2099751)CMMDS: CMMDSUtil_PrintArenaEntry:98: XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX: [40131569]:Adding a new Membership entry (XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX) with 1 members:                               

 

Resolution

 

  • This issue has been addressed in vSAN 8.0 P05 and later releases.

  • It is also recommended to review the network configuration and stability to minimize rebuild triggers and resync slot exhaustion.

 

Additional Information