ESXi host becomes unresponsive if the base disk for the sesparse snapshot is not 4K aligned
search cancel

ESXi host becomes unresponsive if the base disk for the sesparse snapshot is not 4K aligned

book

Article ID: 344840

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Host becomes un-responsive
  • ESXi hostd crashes randomly and backup jobs will fail
  • You will logs similar to below
vobd.log
YYYY-MM-DDTHH:MM:SS.430Z: [UserWorldCorrelator] 111331270026us: [vob.uw.core.dumped] /bin/hostd(34539) /var/core/hostd-worker-zdump.000
YYYY-MM-DDTHH:MM:SS.430Z: [UserWorldCorrelator] 111331270591us: [esx.problem.hostd.core.dumped] /bin/hostd crashed (1 time(s) so far) and a core file may have been created at /var/core/hostd-worker-zdump.000. This may have caused connections to the host to be dropped.
YYYY-MM-DDTHH:MM:SS.430Z: An event (esx.problem.hostd.core.dumped) could not be sent immediately to hostd; queueing for retry.
YYYY-MM-DDTHH:MM:SS.002Z: Successfully sent event (esx.problem.hostd.core.dumped) after 1 failure.


vmkernel log : 
YYYY-MM-DDTHH:MM:SS.110Z cpu54:267514)WARNING: CBT: 2060: Unsupported ioctl 43
YYYY-MM-DDTHH:MM:SS.484Z cpu54:267514)WARNING: CBT: 2060: Unsupported ioctl 43
YYYY-MM-DDTHH:MM:SS.820Z cpu36:267518)WARNING: CBT: 2060: Unsupported ioctl 43
YYYY-MM-DDTHH:MM:SS.264Z cpu42:269732)VSCSI: 6448: handle 8234(vscsi0:5):Destroying Device for world 267514 (pendCom 0)
YYYY-MM-DDTHH:MM:SS.271Z cpu42:269732)CBT: 1561: Disconnecting the cbt device 2c8db7-cbt with filehandle 2919863
YYYY-MM-DDTHH:MM:SS.336Z cpu42:269732)CBT: 2235: Created device 2e8db7-cbt for cbt driver with filehandle 3050935
YYYY-MM-DDTHH:MM:SS:18.336Z cpu42:269732)WARNING: CBT: 2060: Unsupported ioctl 60
YYYY-MM-DDTHH:MM:SS.336Z cpu42:269732)WARNING: CBT: 2060: Unsupported ioctl 59
YYYY-MM-DDTHH:MM:SS.517Z cpu42:269732)CBT: 1561: Disconnecting the cbt device 2e8db7-cbt with filehandle 3050935
YYYY-MM-DDTHH:MM:SS.533Z cpu42:269732)VSCSI: 6448: handle 8233(vscsi0:4):Destroying Device for world 267514 (pendCom 0)
YYYY-MM-DDTHH:MM:SS.541Z cpu42:269732)CBT: 1561: Disconnecting the cbt device 2c8db2-cbt with filehandle 2919858
YYYY-MM-DDTHH:MM:SS.812Z cpu42:269732)FDS: 567: Enabling IO coalescing on driver 'deltadisks' device '27551033-<VMname>-000002-sesparse.vmdk'
YYYY-MM-DDTHH:MM:SS.354Z cpu34:318056 opID=49547957)FDS: 567: Enabling IO coalescing on driver 'deltadisks' device '60295fdf-vmname-000002-sesparse.vmdk'
YYYY-MM-DDTHH:MM:SS.961Z cpu34:318056 opID=49547957)User: 2888: wantCoreDump : hostd-worker -enabled : 1
YYYY-MM-DDTHH:MM:SS.868Z cpu34:318056 opID=49547957)UserDump: 1820: Dumping cartel 34539 (from world 318056) to file /var/core/hostd-worker-zdump.000 ...
YYYY-MM-DDTHH:MM:SS.350Z cpu46:342296)ALERT: hostd detected to be non-responsive
YYYY-MM-DDTHH:MM:SS.397Z cpu30:343120)ALERT: hostd detected to be non-responsive
YYYY-MM-DDTHH:MM:SS.430Z cpu48:318056 opID=49547957)UserDump: 1944: Userworld coredump complete.

Environment

VMware vSphere ESXi 5.5
VMware vSphere ESXi 6.7
VMware vSphere ESXi 6.5
VMware vSphere ESXi 6.0

Resolution

VMware is aware of this issue

As a workaround specify the size of disk in GB or TB it would be 4K aligned (in other words, size is completely divisible by 4096).

This issue is only seen  when size of the base disk for the sesparse snapshot is not 4K aligned

Additional Information

For Windows Virtual Machines we can use the "wmic" utility to validate the 4K Misalignment

From the above output you can validate the if the disks are aligned 

  • For example Disk #0 :- 1048576 / 4096 = 256 - This shows the disk is aligned.
  • It is only misaligned if there is a decimal point as a result when the Starting Offset is Divided by 4096 

Similarly we can also perform the same activity for Linux Guest OS with Command :- fdisk -lu and following the same mathematical calculation as explained above dividing by 4096