This KB provides guidance on recovering from one or multiple vFAT partition issues during an ESXi upgrade.
Hardware precheck of the profile <ProfileName> failed with errors: <VFAT_CORRUPTION ERROR: A problem with one or more vFAT bootbank partitions was detected. Please refer to KB 91136 and run dosfsck on bootbank partitions.
(/var/run/log/lifecycle.log) will show similar to below entries :yyyy-mm-ddThh:mm:ssZ In(14) lifecycle[pid]: runcommand:199 runcommand called with: args = ['/bin/dosfsck', '-V', '-n', '/dev/disks/naa.<id>:<partition>'], outfile = None, returnoutput = True, timeout = 10.
An error occurred while backing up VFAT partition files before re-partitioning: Failed to calculate size for temporary Ramdisk: <error>.
An error occurred while backing up VFAT partition files before re-partitioning: Failed to copy files to Ramdisk: <error>.
Corrupted vFAT partitions may cause upgrades from ESXi 6.5 and 6.7 to versions up to ESXi 7.0 Update 3k or ESXi 8.0c to exhibit the following symptoms.
YYYY-MM-DDTHH:MM:SS SystemStorage t10.ATA_____<ID>___________________________________<ID>: upgrading partition layout...
Traceback (most recent call last):
  File "/bin/initSystemStorage", line 1354, in <module>
    storage.setupSystemPartitions()
  File "/bin/initSystemStorage", line 659, in setupSystemPartitions
    self.upgradePartitionTable(bootDisk)
  File "/bin/initSystemStorage", line 413, in upgradePartitionTable
    upgradeBackup()
  File "/lib64/python3.8/site-packages/systemStorage/upgradeUtils.py", line 307, in upgradeBackup
  File "/lib64/python3.8/site-packages/systemStorage/upgradeUtils.py", line 201, in calculateDirMiBSize
  File "/lib64/python3.8/genericpath.py", line 50, in getsize
FileNotFoundError: [Errno 2] No such file or directory: '/vmfs/volumes/########-########-####-########/log/\x03\x05\x03\x01yd\x1fy.\######\#####'
YYYY-MM-DDTHH:MM:SS.523Z Plugin system-storage failed Invoking method start (rc=1)
Reason for the dirty bit: the dirty bit is set by another OS, as ESXi does not utilize this bit. It indicates that the partition was mounted without a corresponding unmount operation.
The cause of the other vFAT failures is currently under investigation.
To resolve the issue, follow below steps to repair the faulty vFAT partitions by using dosfsck.
Identify all vFAT partitions:
Each ESXi host has 4 or 5 vFAT partition on ESXi 6.5 and ESXi 6.7: 2 Bootbanks, Scratch, and Locker
| # esxcli storage filesystem listMount Point                                        Volume Name  UUID                                 Mounted  Type            Size          Free-------------------------------------------------  -----------  -----------------------------------  -------  ------  ------------  ------------/vmfs/volumes/########-########-####-############  datastore1   trueVMFS-6129385889792127599116288/vmfs/volumes/truevfat       299712512108437504/vmfs/volumes/####-############-########truevfat       26185318488797184/vmfs/volumes/########-####-################-####-########truevfat      42935910404079943680/vmfs/volumes/truevfat       261853184261849088 | 
From the mount points, it's possible to identify disk and partition
| # vmkfstools -P /vmfs/volumes/vfat-0.04 (Raw Major Version: 0) file system spanning 1 partitions.File system label (if any):Mode: privateCapacity 299712512 (36586 file blocks * 8192), 108437504 (13237 blocks) avail, max supported file size 0Disk Block Size: 512/0/0UUID: Partitions spanned (on "disks"):    mpx.vmhba0:C0:T0:L0:8Is Native Snapshot Capable: NO | 
The disk and partition id is mpx.vmhba0:C0:T0:L0:8.
Note: The "mpx ID" strings are just examples; in your case, you might see "naa.*** ID."
Repeat this step for all vFAT partitions. Finally, you will have list like this
mpx.vmhba0:C0:T0:L0:2 (scratch)
mpx.vmhba0:C0:T0:L0:5 (bootbank 1)
mpx.vmhba0:C0:T0:L0:6 (bootbank 2)
mpx.vmhba0:C0:T0:L0:8 (locker)
Note: This step is only required for upgrades from 6.5 and 6.7.
To avoid any interference between the following steps and any daemon writing on the disk, its required to check for open file handles and close them.
# kill $(cat /var/run/crond.pid)
# /usr/lib/vmware/vmsyslog/bin/shutdown.sh
Check for further daemons having open file handles on the scratch partition and stop these daemons
| # lsof |grep scratch1001391762vmfstracegd           FILE                        4/scratch/vmfstraces/vmfsGlobalTrace.trace.0.gz# /etc/init.d/vmfstraced stopwatchdog-vmfstracegd: Terminating watchdog process with PID 1001391748vmfstracegd stopped[root@localhost:~] lsof |grep scratch-- note: -####-######## lsof |grep -####-########1001391489rhttpproxy            FILE                       18/vmfs/volumes/########-####-#########-1.pcap1001391489rhttpproxy            FILE                       19/vmfs/volumes/########-####-#########-1.pcap# /etc/init.d/rhttpproxy stop# lsof | grep var/run/log 2101088 python FILE 5 /var/run/log/vsandevicemonitord.log # /etc/init.d/vsandevicemonitord stop | 
Perform any of below Solutions to recover the corrupted vFAT partitions.
===========================================================================
For all identifies vFAT partitions, check the file system integrity and repair the disk as needed
# dosfsck -Vv /dev/disks/<disk and partition id>
disk and partition id was derived in the previous stepFor instance, the output for a healthy partition
| # dosfsck -Vv /dev/disks/mpx.vmhba0\:C0\:T0\:L0:2dosfsck 2.11 (12 Mar 2005)dosfsck 2.11, 12 Mar 2005, FAT32, LFNChecking we can access the last sector of the filesystemBoot sector contents:System ID "MSDOS5.0"Media byte 0xf8 (hard disk)       512 bytes per logical sector     65536 bytes per cluster         2 reserved sectorsFirst FAT starts at byte 1024 (sector 2)         2 FATs, 16 bit entries    131072 bytes per FAT (= 256 sectors)Root directory starts at byte 263168 (sector 514)       512 root directory entriesData area starts at byte 279552 (sector 546)     65515 data clusters (4293591040 bytes)32 sectors/track, 64 heads         0 hidden sectors   8386560 sectors totalStarting check/repair pass.Checking for unused clusters.Starting verification pass.Checking for unused clusters./dev/disks/mpx.vmhba0:C0:T0:L0:2: 222 files, 3279/65515 clusters | 
Solution 2: Use ESXi ISO to repair the boot partition
In case if the above option is failing to repair the disk, proceed to repair the same using an ESXi ISO
cdromBoot"Sample:
Default:
Modified:
# dosfsck -v -a /dev/disks/<disk and partition id>
root@esxi:/] dosfsck -Vv /dev/disks/t10.NVMe____<Vendor>____________________________a5##############:5
CP850//TRANSLIT: Invalid argumentCP850: Invalid argumentfsck.fat 4.1+git (2017-01-24)Checking we can access the last sector of the filesystemBoot sector contents:System ID "MSDOS5.0"Media byte 0xf8 (hard disk)       512 bytes per logical sector`     65536 bytes per cluster         2 reserved sectorsFirst FAT starts at byte 1024 (sector 2)         2 FATs, 16 bit entries    131072 bytes per FAT (= 256 sectors)Root directory starts at byte 263168 (sector 514)       512 root directory entriesData area starts at byte 279552 (sector 546)     65515 data clusters (4293591040 bytes)32 sectors/track, 64 heads         0 hidden sectors   8386560 sectors totalStarting check/repair pass.Orphaned long file name part "mfg_net"
For all identified vFAT partitions, check the file system integrity and repair the disk as needed
===========================================================================
# cp /scratch/ /vmfs/volumes/datastore1/scratchBackup
(At this point its very likely that the cp command returns a failure. Note, the filesystem is corrupted and one or more files or filenames will be invalid. A this point copy folder by folder or file by file and leave the corrupted files on the disk. After re-formatting, the file will be lost!)
| # vmkfstools -C vfat /dev/disks/mpx.vmhba0:C0:T0:L0:2create fs deviceName:'/dev/disks/mpx.vmhba0:C0:T0:L0:2', fsShortName:'vfat', fsName:'(null)'deviceFullPath:/dev/disks/mpx.vmhba0:C0:T0:L0:2deviceFile:mpx.vmhba0:C0:T0:L0:2Checking ifremote hosts are using thisdevice as a valid file system. This may take a few seconds...Creating vfat file system on "mpx.vmhba0:C0:T0:L0:2"with blockSize 1048576and volume label "none".Successfully created newvolume: 640748a7-########-####-########46fa | 
Get the volume ID from the previous command (e.g., 640748a7-########-####-##########fa)
# cp -r /vmfs/volumes/datastore1/scratchBackup/* /vmfs/volumes/640748a7-########-####-##########fa/
===========================================================================
If the output for Step 1 is as per the following screenshot, from the highlighted box select option 1 write changes and the partition would be repaired.
Once done re-run the same command to check the file system integrity:
# dosfsck -Vv /dev/disks/<disk and partition id>If following messages are prompted after running command, choose 'No action'.
# dosfsck -Vv /dev/disks/<disk and partition id>
| 0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt. 1) Remove dirty bit 2) No action [12?1] | 
Partition will be repaired by command below
# dosfsck -a -w /dev/disks/<disk and partition id>.
Note: This issue has been permanently fixed in the ESXi 8.0u3b  techdocs.broadcom.com.