vSAN -- Host Reboot/Starting up vSAN -- Takes long time
search cancel

vSAN -- Host Reboot/Starting up vSAN -- Takes long time

book

Article ID: 326951

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:
  • A Host which is part of a vSAN Cluster stops responding during its reboot.
  • On the DCUI screen you see a message similar to:

    VSAN Initializing SSD: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX Please wait...


Environment

VMware vSAN (All Versions)

Cause

This is expected behavior when rebooting a Host which is part of a vSAN Cluster. 
At this stage of the reboot, the vSAN ESXi Host processes all log entries to generate all required metadata tables.
The time taken to complete this task will depend on the load and the number of log blocks in the write buffer before the host was rebooted.

It is important to understand that this task can take some time to complete. 
In the worst case scenario, under normal load and healthy conditions when there are large amounts of data in the write buffer,
it may take up to several hours (per disk group) to complete the task.  
The time required depends on the state of the Cluster, the Cache Tier (= SSD) and Capacity devices in use, network and compute components and a few other variables.

Note: 
When a host reboot is at this stage, a further reboot of the host should be avoided as it will not speed up this process. 
There is no risk to the data on the disks of the host if a further reboot does happen when the host is at this stage of the reboot.

Resolution

Verify that Host is making progress by checking its current messaging:

1.) Access the Hosts live logging via DCUI Console
Press Alt + F12 in the DCUI Console which will open the current vmkernel logging.
This will allow you to check on what the Host is currently doing.
If Alt + F12 is not working then enable serial logging for the instances before going for a hard reboot (if needed). 

Additional Information if needed:
How to access DCUI/Console of ESXi using ALT+F Keys (343841)
Video by VMware: Overview of DCUI
Enabling serial-line logging for ESXi (311033)


2.) Check the live logging to verify that Host is making progress 

The Host is making progress (and does not need to be rebooted again)
if you notice similar messages as shown below:
2017-02-02T04:53:38.287Z cpu13:33631)LSOMCommon: SSDLOGLogEnumProgress:1168: Estimated time for recovering 1542499 log blks is 185229 ms device: mpx.vmhba1:C0:T1:L0:2
2017-02-02T04:54:43.366Z cpu34:33253)PLOG: PLOG_Recover:760: Doing plog recovery on SSD mpx.vmhba1:C0:T1:L0:2
2017-02-02T04:57:18.154Z cpu28:33633)LSOMCommon: SSDLOGEnumLog:1311: PLOG: Total Time: 154787953 us, Read Time: 114793757 us, Process Time: 81268242 us, numReads: 999141
2017-02-02T04:57:18.521Z cpu36:33255)PLOG: PLOGRecDisp:701: PLOG recovery complete ########-####-####-####-########f076:Processed 5742172 entries, Took 155155 ms

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

The Host might not making progress if you do not see similar messages as shown above.
This could indicate an underlying Disk Problem. 
In such a case you could consider to initiate a hard reboot to attempt to resolve the Disk Problem.



Additional Information

Long boot times may also occur if you are experiencing issues with component metadata health. 
For more information, see vSAN Health Service – Physical Disk Health – Component Metadata Health (327060).