Disk consolidation needed warnings appear on multiple VMs following storage array capacity saturation
search cancel

Disk consolidation needed warnings appear on multiple VMs following storage array capacity saturation

book

Article ID: 434690

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • Multiple virtual machines show a "Virtual machine disk consolidation is needed" warning in the vSphere Client Summary tab simultaneously, following a period of storage disruption.
  • No snapshots are visible in the vCenter Snapshot Manager for the affected VMs.
  • The affected VMs are backed by a shared Fibre Channel or iSCSI datastore.
  • Virtual machine file system operations on the affected datastore hang or return device-busy errors. You may observe this during SSH inspection similar to:
    rm: can't remove 'test.txt': Device or resource busy
    
  • In /var/log/vmkernel.log, you see SCSI errors on the affected LUN similar to:
    ScsiDeviceIO: 4633: Cmd(0x...) to NMP device "naa.xxxxxxxxxxxxxxxxxxxx" failed H:0x7 D:0x28 P:0x0
    
  • In /var/log/vmkwarning.log, you see volume eviction and registration key errors similar to:
    WARNING: HBX: 2723: Failed to cleanup registration key on volumexxxxxxxx-xxxxxxxx-xxxx-xxxxxxxxxxxx: Failure
    WARNING: Vol3: 4678: Error closing the volume: . Eviction fails: Failure
    
  • Running esxcli storage core device stats get -d <naa_id> against the affected LUN shows a disproportionately high failed command count compared to other LUNs in the environment.
  • When browsing the VM directory on the datastore via SSH, snapshot delta files are present with names ending in -000000.vmdk or similar, despite no snapshots appearing in the Snapshot Manager.

Additional symptoms reported:

  • "VMs show disk consolidation warning but we don't have snapshots"
  • "Disk consolidation needed after datastore migration"
  • "VMs in hung state"
  • "Consolidation needed on all VMs on one datastore"

Environment

  • VMware vSphere ESXi 7.x or newer with Fibre Channel or iSCSI block storage
  • VMs backed by a VMFS datastore on an external storage array
  • A third-party image-based backup application is in use (for example, Commvault, Veeam, Cohesity, or similar solutions using the VDDK API)

Cause

This occurs when all of the following conditions are met:

  • The backing storage array reached near-full capacity (typically above 95%)
  • A third-party backup application was actively managing snapshots on the affected VMs via the VMware VDDK API at the time of the storage event

When a storage array runs out of resources to service SCSI commands, it returns a TASK_SET_FULL status (device status 0x28) to the ESXi host. This indicates the array's command queue is exhausted and it cannot accept further I/O. As a result, ESXi begins throttling I/O to the affected LUN, file lock operations begin to fail, and volumes may be forcibly evicted from the VMFS stack.

Third-party backup applications that use the VDDK API create snapshot delta disks on VMs during backup jobs. These snapshots are intentionally hidden from the vCenter Snapshot Manager — they do not appear there by design, as the backup application manages them directly. Under normal conditions, the backup application creates the snapshot, backs up the data, and then commits and removes the snapshot when the job completes.

When storage I/O fails mid-operation, the backup application may be unable to complete the snapshot removal and commit process. The snapshot delta files are left on the datastore in an unconsolidated state. ESXi detects these residual delta files and raises the "disk consolidation needed" warning on each affected VM. The underlying storage problem is the cause; the consolidation warning is a downstream symptom.

The -000000 snapshot naming pattern seen on delta files in some environments is a naming convention used by certain backup applications and does not indicate a different cause. It reflects how that application numbers its API-managed snapshots and is unrelated to any snapshot visible in vCenter.

Resolution

Step 1: Resolve the storage capacity issue first

Before attempting consolidation, verify that the underlying storage capacity issue has been addressed. Attempting consolidation while the array is still resource-constrained will likely fail.

  1. Contact your storage vendor or array administrator and confirm that available capacity on the array has been restored to a safe operational level (typically below 80%).
  2. Confirm that TASK_SET_FULL errors (D:0x28) are no longer appearing in /var/log/vmkernel.log on the affected hosts.

Step 2: Clear transient host state

After confirming storage capacity is restored, perform a rolling reboot of the ESXi hosts in the affected cluster to clear any residual I/O state from the storage disruption event.

  1. Place one ESXi host at a time into maintenance mode using vCenter to allow vMotion to migrate running VMs.
  2. Reboot the host.
  3. Remove the host from maintenance mode and verify VMs migrate back cleanly.
  4. Repeat for each host in the cluster.

Step 3: Attempt disk consolidation

  1. In the vSphere Client, right-click an affected VM.
  2. Select Snapshots > Consolidate.
  3. Monitor the task in the Recent Tasks pane until it completes.
  4. Repeat for each affected VM.

If consolidation fails or the Consolidate option is grayed out:

  1. Verify that no backup jobs are currently running against the affected VM. If a job is active, allow it to complete or cancel it before retrying.
  2. Verify that no backup proxy VM has a disk from the affected VM attached. If it does, detach the disk from the proxy before retrying consolidation. See Snapshot consolidation fails due to locks held by third-party backup software for steps.
  3. If consolidation still fails, clone the VM to force consolidation:
    • In the vSphere Client, right-click the affected VM and select Clone > Clone to Virtual Machine.
    • Complete the clone wizard, placing the clone on a datastore with sufficient free space.
    • Power on the cloned VM and verify it functions correctly.
    • Once confirmed stable, the original VM can be decommissioned.
    • For a single-disk alternative using the CLI, see Consolidating/Committing snapshots in VMware ESXi for the vmkfstools -i method.

Additional Information