vSAN Skyline Health reports "Host Incompatible Due To Use Of Newer On-Disk Format By Other Hosts In The Cluster, Upgrade Esxi Software Of This Host"
search cancel

vSAN Skyline Health reports "Host Incompatible Due To Use Of Newer On-Disk Format By Other Hosts In The Cluster, Upgrade Esxi Software Of This Host"

book

Article ID: 326529

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:
vSAN Skyline Health reports
Host Incompatible Due To Use Of Newer On-Disk Format By Other Hosts In The Cluster, Upgrade Esxi Software Of This Host

vCenter log /var/log/vsan-health/vmware-vsan-health-summary-result.log reports:
(Host-32188, HostIncompatibleDueToUseOfNewerOn-DiskFormatByOtherHostsInTheCluster, UpgradeEsxiSoftwareOfThisHost),

Cause

There are multiple causes that can result in this issue:
  • Host with older on-disk format introduced into the cluster or disk group recreated with an older version.
If the vSAN on disk format has already been updated cluster wide, it can become incompatible with prior ESXi versions and disk groups. For more information on vSAN on disk format see Understanding vSAN on-disk format versions and compatibility .
 
  • Controller issues resulting in a false-alert for this health check
Timeouts or aborts from HBA to vCenter vSAN health:
cat /var/log/vmkernel.log | grep WARNING | grep Abort |less
2020-05-16T03:01:06.164Z cpu7:2097756)WARNING: lsi_mr3: mfi_TaskMgmt:694: Abort not supported on C2:T0:L0 for SMID 29
2020-05-16T03:01:45.017Z cpu5:3291114)WARNING: lsi_mr3: mfi_TaskMgmt:694: Abort not supported on C2:T0:L0 for SMID 5


vmkernel.log
2020-05-16T02:50:41.016Z cpu26:2098082)DVFilter: 5963: Checking disconnected filters for timeouts
2020-05-16T03:00:12.352Z cpu24:2100138)lsi_mr3: MR_PopulateDrvRaidMap:342: ldCount 11
2020-05-16T03:00:12.352Z cpu24:2100138)lsi_mr3: MR_PopulateDrvRaidMap:343: Max. VD supported 256
2020-05-16T03:00:12.353Z cpu29:2097989)lsi_mr3: mfiReadMaxEvents:274: Event:From SeqNum 19478 to 19479. Count 2
2020-05-16T03:00:12.353Z cpu29:2097989)lsi_mr3: megasas_hotplug_work:370: event code: 0x27.
2020-05-16T03:00:12.353Z cpu29:2097989)lsi_mr3: megasas_hotplug_work:370: event code: 0x42.
2020-05-16T03:00:41.014Z cpu26:2098082)DVFilter: 5963: Checking disconnected filters for timeouts
2020-05-16T03:01:06.164Z cpu7:2097756)lsi_mr3: mfi_TaskMgmt:665: Processing taskMgmt abort for device: vmhba3:C2:T0:L0
2020-05-16T03:01:06.164Z cpu7:2097756)lsi_mr3: mfi_TaskMgmt:684: ABORT request for SN 1004909 Wld 2097217
2020-05-16T03:01:06.164Z cpu7:2097756)WARNING: lsi_mr3: mfi_TaskMgmt:694: Abort not supported on C2:T0:L0 for SMID 29
2020-05-16T03:01:07.163Z cpu50:2098357)lsi_mr3: mfi_TaskMgmt:665: Processing taskMgmt virt reset for device: vmhba3:C2:T0:L0
2020-05-16T03:01:07.163Z cpu50:2098357)lsi_mr3: mfi_TaskMgmt:670: Virtual Reset request from Wld 2097217
2020-05-16T03:01:09.057Z cpu4:2100154)HBX: 3040: 'bfifdrib084.tdbfg.com-local': HB at offset 4063232 - Waiting for timed out HB:
2020-05-16T03:01:09.057Z cpu4:2100154)  [HB state abcdef02 offset 4063232 gen 7 stampUS 811817301223 uuid 5eb2f481-9ee0cb11-40ce-0cc47aa4151c jrnl <FB 11> drv 24.82 lockImpl 3 ip 10.166.26.34]
2020-05-16T03:01:10.215Z cpu26:2097756)lsi_mr3: fusionWaitForOutstanding:3735: megasas: waiting for 4 commands to complete
2020-05-16T03:01:15.214Z cpu26:2097756)lsi_mr3: fusionWaitForOutstanding:3735: megasas: waiting for 4 commands to complete
2020-05-16T03:01:19.062Z cpu4:2100154)HBX: 3040: 'bfifdrib084.tdbfg.com-local': HB at offset 4063232 - Waiting for timed out HB:
2020-05-16T03:01:19.062Z cpu4:2100154)  [HB state abcdef02 offset 4063232 gen 7 stampUS 811817301223 uuid 5eb2f481-9ee0cb11-40ce-0cc47aa4151c jrnl <FB 11> drv 24.82 lockImpl 3 ip 10.166.26.34]
2020-05-16T03:01:20.176Z cpu1:2102118 opID=872b82ce)World: 11943: VC opID sps-Main-416806-330-26-9-01a7 maps to vmkernel opID 872b82ce
2020-05-16T03:01:20.176Z cpu1:2102118 opID=872b82ce)HBX: 3040: 'bfifdrib084.tdbfg.com-local': HB at offset 4063232 - Waiting for timed out HB:
2020-05-16T03:01:20.176Z cpu1:2102118 opID=872b82ce)  [HB state abcdef02 offset 4063232 gen 7 stampUS 811817301223 uuid 5eb2f481-9ee0cb11-40ce-0cc47aa4151c jrnl <FB 11> drv 24.82 lockImpl 3 ip 10.166.26.34]
2020-05-16T03:01:20.214Z cpu7:2097756)lsi_mr3: fusionWaitForOutstanding:3735: megasas: waiting for 4 commands to complete
2020-05-16T03:01:25.214Z cpu7:2097756)lsi_mr3: fusionWaitForOutstanding:3735: megasas: waiting for 4 commands to complete
2020-05-16T03:01:27.163Z cpu50:2098357)lsi_mr3: mfi_VirtReset:640: VIRT_RESET 0x4306e81642c0 timedout. Proceed to fusion reset


Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Resolution



Workaround:
Make sure all hosts are on the same build version and vSAN on disk format version. To check, run the following command to list all disks and filter for the used on disk formats:
esxcli vsan debug disk list | grep Version | sort | uniq -c
     16    Version: 10


If a older version is found, check if the on disk format was limited in the past. Check the locations outlined in How to format vSAN Disk Groups with a legacy format version . If the data is healthy without the host, change the setting to the latest supported version of the host version and recreate the disk group according to the VMware documentation:
https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-02EA5E68-72B1-409B-B00F-09BD648E2215.html 

Verify Drivers and firmware on the host are on the vSAN HCL and current. If not, update drivers and firmware.
If driver and firmware matches vSAN HCL and is current identify component in the IO path (drive, controller, backplane) that causes the aborts.

Power cycle the host.

If neither of the above actions resolves the issue replace the controller.

Additional Information