Virtual machine becomes inaccessible after the vSAN network outage

search cancel

Virtual machine becomes inaccessible after the vSAN network outage

book

Article ID: 402397

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms

Virtual machines may become inaccessible after recovering from an unplanned network outage like inter site link outage.

vSAN objects will go inaccessible.

Run the below commands to collect list of inaccessible objects.

# esxcli vsan debug object list --health=inaccessible

Object UUID: a26fa467-e005-####-####-############
Version: 15
Health: inaccessible - Object is initializing or creating.(APD)
Owner: esxi01
Used: 21846.21 GB
hostFailuresToTolerate: 1
subFailuresToTolerate: 1
CSN: 3054
SCSN: 3181
spbmProfileName: vsanRaid5-Stretched(Cluster01)
locality: None
Path: N/A
Group UUID: a775a367-844d-####-####-############
Directory Name: N/A

vCenter UI > Cluster > Monitor > vSAN > Resyncing Objects may show resync not progressing.
ESXi host may not enter maintenance mode with Ensure Accessibility and Full data evacuation options due to the inaccessible object.

Environment

VMware vSAN (All Versions)

Cause

This is rare case scenario where the affected object entered an APD (All Paths Down) state due to the exhaustion of available resync scheduler slots, causing resynchronization tasks to become stuck and unable to progress.
The environment shows frequent small and large ping test failures, indicating an unstable network or inter-switch link (ISL). This network instability can lead to repeated object rebuilds, increasing the risk of resync contention

During this issue, this will be observed in the /var/log/vsansystem log on the ESX host, indicating network instability as hosts are leaving the network.

vsansystem[2102272] [vSAN@6876 sub=VsanSystemProvider opId=CMMDSMembershipUpdate-9426] Complete, nodeCount: 13, runtime info: (vim.vsan.host.VsanRuntimeInfo) {
.
.
info vsansystem[2102264] [vSAN@6876 sub=VsanSystemProvider opId=CMMDSNodeUpdate-973e] Complete, nodeCount: 9, runtime info: (vim.vsan.host.VsanRuntimeInfo) {
.
.
info vsansystem[2102132] [vSAN@6876 sub=VsanSystemProvider opId=CMMDSNodeUpdate-a1c8] Complete, nodeCount: 13, runtime info: (vim.vsan.host.VsanRuntimeInfo) {

In the vmkernel log, membership drops are observed below during the time of the incident, indicating network isolation again.

vmkernel: cpu77:2099751)CMMDS: LeaderBuildHeartbeatMessage:2430: XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX: [40131563]:Current membership uuid XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX has 1 members
vmkernel: cpu77:2099751)CMMDS: CMMDSUtil_PrintArenaEntry:98: XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX: [40131569]:Adding a new Membership entry (XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX) with 1 members:

Resolution

This issue has been addressed in vSAN 8.0 P05 and later releases.
It is also recommended to review the network configuration and stability to minimize rebuild triggers and resync slot exhaustion.

Additional Information

Reference :vSAN Skyline Health - Data Health - vSAN Object Health - inaccessible objects

Feedback

thumb_up Yes

thumb_down No