The hostd service intermittently becomes unresponsive
search cancel

The hostd service intermittently becomes unresponsive

book

Article ID: 318488

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

To avoid ESXi host not-responding state.

Symptoms:
  • ESXi hosts shows not-responding on the vCenter.
  • The hostd service intermittently becomes unresponsive.
  • In vmkernel.log you will see alerts such as
   YYYY-MM-DDTHH:MM:SS.Z cpu2:2179102)ALERT: hostd detected to be non-responsive
  • You may observe the following entries preceding the hostd detected to be non-responsive.
   YYYY-MM-DDTHH:MM:SS.Z cpu7:2098766 opID=b3d57165)FS3J: 3146: Aborting txn (0x430aa50d2890) callerID: 0xc1d00002 due to failure pre-committing: Optimistic lock acquired by another host.  
   YYYY-MM-DDTHH:MM:SS.Z cpu7:2097782)DVFilter: 6054: Checking disconnected filters for timeouts
   YYYY-MM-DDTHH:MM:SS.Z cpu6:2202506)DLX: 4330: vol 'datastore', lock at 188628992: [Req mode 1] Checking liveness:
   YYYY-MM-DDTHH:MM:SS.Z cpu6:2202506)[type 10c00002 offset 188628992 v 4266, hb offset 3346432 gen 7313, mode 1, owner 5fa01ff9-a25c7506-a069-00108682abde mtime 544318 num 0 gblnum 0 gblgen 0 gblbrk 0]


Note:The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on your environment.

Environment

VMware vSphere ESXi 7.0.0
VMware vSphere ESXi 6.7

Cause

In rare cases, a race condition of multiple threads attempting to create a file and remove the directory at the same directory might cause a deadlock that fails the hostd service. Such a deadlock might affect other services as well, but the race condition window is small, and the issue is not frequent.

Resolution


This issue is resolved in:
VMware ESXi 6.7 Patch 04 (build number 17167734) - Patch Release ESXi670-202011002
VMware ESXi 7.0 Update 1c (build number 17325551) - Patch Release Update 1c


Workaround:
Use any of the below workarounds
Identify if there is any stale dvport files. For example, from vCenter, the vDS may only contain 100 ports but there may be 200 dvport files under the .dvsData/DVS UUID/ directory.
If there are stale dvport files, unregister virtual machines from vCenter residing on that datastore then delete the .dvsData folder and after 5 minutes, they will be regenerated.

Or

Move the ESXi hosts to different vDS.

Additional Information

For more information refer to https://knowledge.broadcom.com/external/article?legacyId=1018394

Refer also to vDS config location and HA blog

Impact/Risks:
The service restores only after a restart of the ESXi host.