vSAN Data Protection Snapshot Task Failures Due to Cluster Health Status
search cancel

vSAN Data Protection Snapshot Task Failures Due to Cluster Health Status

book

Article ID: 393764

calendar_today

Updated On:

Products

VMware vSAN VMware Cloud Foundation

Issue/Introduction

Symptoms

vSAN Data Protection (vSAN DP) snapshot tasks may fail to execute or appear skipped. Observed symptoms include:

  • Task Failures: Manual snapshot attempts return the error: Take snapshot task failed. Check the vSAN data protection UI logs for details.
  • Skipped Schedules: Configured VDP snapshot schedules fail to trigger despite valid settings.
  • Health Indicators: The vSAN Cluster health status displays a Yellow (Warning) or Red (Critical) state in the Skyline Health dashboard.
  • The vSAN Snapshot Service appliance logs confirm that the service blocks tasks when a health pre-check failure occurs:
    File: /var/log/vmware/snapservice/snap-service.log
    {"level":"info","timestamp":"2025-##-##T##:##:##.###Z","C":"snapshots/prechecks.go:56","message":"cluster Health is RED","opID":"########","cluster":"domain-c######"}
    File: /var/log/vmware/snapservice/snap-service-ui.log
    {"level":"error","timestamp":"2025-##-##T##:##:##.###Z","C":"controller/LogsImpl.go:48","message":"[\"[vSAN SM ERROR]: \",\"Take snapshot task failed. Check the vSAN data protection UI logs for details.\"]","error":"UI error"........

Environment

VMware vSAN 8.0.x
vSAN Data Protection
VMware Cloud Foundation (VCF) 9.x

Cause

The vSAN Data Protection service is designed to prioritize cluster stability and data integrity. It performs a mandatory pre-check of the vSAN Skyline Health status before any snapshot operation. If the cluster reports critical health issues, the service blocks snapshots to prevent additional stress on a degraded environment.

Common triggers for these failures include:

  • Hardware Failures: Physical disk failures, predictive failure alerts, or storage congestion KB 326969.
  • Configuration Alerts: vSAN HCL DB up-to-date alerts KB 327011.
  • Software Anomalies: Known issues such as File Services critical alerts observed on ESXi 8.0 U3f (Build 24784735).
  • Connectivity: Network communication issues between the SnapService appliance and vCenter.

Resolution

Resuming VDP snapshots requires the resolution of underlying health issues to return the cluster to a Green state.

  1. Navigate to Cluster > Monitor > vSAN > Skyline Health in the vSphere Client.
  2. Identify the unhealthy tests marked with red or yellow icons.
  3. Review the Troubleshooting documentation provided within the UI for the failing tests.
  4. Hardware Remediation: Replace failed or degraded drives identified by the health checks.
  5. Software Remediation: Upgrade ESXi hosts to a version containing the fix for the reported issue. For File Services issues on build 24784735, the cluster should be upgraded to ESXi 8.0 U3h or later.

Workaround If the health alert is a confirmed false positive or a software update that does not impact data integrity, silencing the alarm allows the VDP pre-check to proceed:

  1. Select the specific alarm causing the health degradation (e.g., "vSAN File Service Health") in the Skyline Health view.
  2. Select Silence Alert.
  3. Verify the overall cluster health status no longer displays as Red.
  4. Scheduled VDP tasks will resume during the subsequent cycle.

Note: Silencing critical alarms for extended periods is not recommended, as it may mask risks to data availability.

Screenshot is based off newer versions of vCenter and ESXi

Additional Information

Following a selection of available KB Articles related to the vSAN Healthcheck:
 
vSAN Health Check Information
vSAN Health Service - Cluster Health - Advanced vSAN configuration in sync
vSAN Health Service - Cluster Health - vSAN daemon liveness check
 
vSAN Health Service - Limits Health - Current Cluster Situation
vSAN Health Service - Limits Health - After one additional host failure
 
vSAN Health Service - vSAN HCL Health - Controller is VMware certified for ESXi release
vSAN Health Service - vSAN HCL Health - Controller Driver
vSAN Health Service - vSAN HCL Health - vSAN HCL DB up-to-date
vSAN Health Service - vSAN HCL Health - SCSI Controller on vSAN HCL
 
vSAN Health Service - Physical Disk Health - Overall Disk Health
vSAN Health Service - Physical Disk Health - Disk Capacity
vSAN Health Service - Physical Disk Health - Component Metadata Health
vSAN Health Service - Physical Disk Health - Congestion
vSAN Health Service - Physical Disk Health - Memory pools

vSAN Health Service - Network Health - Hosts disconnected from vCenter Server
vSAN Health Service - Network Health - Unexpected vSAN cluster members
vSAN Health Service - Network Health - vSAN Cluster Partition
vSAN Health Service - Network Health - Hosts with vSAN disabled
vSAN Health Service - Network Health - All hosts have a vSAN vmknic configured
vSAN Health Service - Network Health - Hosts small ping test (connectivity check) and Hosts large ping test (MTU check)
vSAN Health Service - Network Health - Hosts with connectivity issues