High Stun time observed in snapshot consolidation
search cancel

High Stun time observed in snapshot consolidation

book

Article ID: 320319

calendar_today

Updated On:

Products

VMware Cloud on AWS

Issue/Introduction

Known Issue


Symptoms:
  • During a snapshot creation or removal operation, virtual machines residing on NFS storage become unresponsive, it is also observed ESXi host also going unresponsive at the same time.
  • When removing snapshots after backing up a virtual machine residing on an NFS datastore using a backup application, the virtual machine becomes unresponsive for approximately 30 seconds.
  • Removing snapshots after backing up a virtual machine residing on an NFS datastore takes a long time to complete.
  • This issue occurs when the target virtual machine disk was hot-added for the backup/restore operation.

 

 


Cause

When multiple hosts open a vmdk in RO mode, lock file is marked as shared. When the other hosts are done with the locking, lock file still remains as shared. Now when the only host, host1, wants to acquire EXCL lock on the same vmdk, it has to go through the process of checking the liveness of the lock, which takes 30 seconds to break and acquire the lock.
During consolidation, since VM tries to upgrade to EXCL from RO lock on parent vmdk, it incurs 30 seconds of stun time.

Resolution

Issue is resolved in VMware Cloud on AWS 1.24 and ESXi 8.0.2 P03


Workaround:

none

Additional Information

Impact/Risks:

unable to access virtual machine during snapshot consolidation