Critical VM is down impacting production environment and resync is stalled
search cancel

Critical VM is down impacting production environment and resync is stalled

book

Article ID: 412242

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • Experiencing issues with multiple VMs in a vCenter environment after ESXi host patching. The main problems include:
  • VMs not powering on or being inaccessible
  • Attempts to power on VMs resulting in timeouts or "operation not allowed in current state" errors.
  • Error messages related to disk descriptor file mismatches (parent/child IDs),
  • Absent or stale vSAN objects
  • Inaccessible VM folders in the datastore. 
  • Error messages in vCenter and hostd logs indicating VM power-on failures and disk descriptor file mismatches, particularly
  • Patch operations were performed manually via SSH, with hosts put into maintenance mode;
  • Current Resync is stalled

Cause

Host Failed to go in to Maintenance Mode due to resync in progress and then the host was rebooted. 

vmkernel.log

####-##-##T##:##:##.####Z info hostd[########] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 8052 : The host has failed entering maintenance mode.
####-##-##T##:##:##.####Z info hostd[60473869] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 8390 : Host is rebooting.

vsananalyticsevents.log

####-##-##T##:##:##.####Z info vsananalyticsevents {"eventType": "RESYNC_IN_PROGRESS", "resourceId": "########-####-####-####-############", "eventTs": ##########.######, "eventLocation": [{"location": "CLUSTER", "entityId": ########-####-####-####-############", "childEntitiesLocation": [{"location": "HOST", "entityId": "########-####-####-####-############", "childEntitiesLocation": []}, {"location": "HOST", "entityId": "########-####-####-####-############", "childEntitiesLocation": []}, {"location": "HOST", "entityId": "########-####-####-####-############", "childEntitiesLocation": [.......

This cause the resync to stall and creating stale components. 

Resolution

  • Run the following on all hosts. 

     vsish -e set /vmkModules/vsan/dom/ownerAbdicateAll

  • Check to see if resync progresses and if progressing allow to finish.