This KB is to recover from a corrupted one or more vFAT partitions from ESXi 6.5/6.7 which is preventing a re-partitioning on upgrading/patching to ESXi 7.0.x or ESXi 8.0.
Symptoms:
A problem with one or more vFAT bootbank partitions was detected. Please refer to KB 91136 and run dosfsck on bootbank partitions.
"(/var/run/log/lifecycle.log
) will show similar to below entries :yyyy-mm-ddThh:mm:ssZ In(14) lifecycle[pid]: runcommand:199 runcommand called with: args = ['/bin/dosfsck', '-V', '-n', '/dev/disks/naa.<id>:<partition>'], outfile = None, returnoutput = True, timeout = 10.
yyyy-mm-ddThh:mm:ssZ In(14) lifecycle[pid]: upgrade_precheck:1836 dosfsck output: b'CP850//TRANSLIT: Invalid argument\nCP850: Invalid argument\nfsck.fat 4.1+git\n0x25: Dirty bit is set. Fs
was not properly unmounted and some data may be corrupt.\n Automatically removing dirty bit.\nStarting check/repair pass.\nStarting verification pass.\n\nLeaving filesystem unchanged.\n/dev/disks/naa.<id>:<partition>: 121 files, 5665/65515 clusters\n'
An error occurred while backing up VFAT partition files before re-partitioning: Failed to calculate size for temporary Ramdisk: <error>.
An error occurred while backing up VFAT partition files before re-partitioning: Failed to copy files to Ramdisk: <error>.
Hardware precheck of the profile <ProfileName> failed with errors: <VFAT_CORRUPTION ERROR: A problem with one or more vFAT bootbank partitions was detected. Please refer to KB 91136 and run dosfsck on bootbank partitions.
Due to corrupted vFAT partitions,upgrades from ESXi 6.5 and 6.7 to versions upto ESXi 7.0 Update 3k or ESXi 8.0c may show the following symptoms.
ramdisk (root) is full
in the vmkwarning.log
file.YYYY-MM-DDTHH:MM:SS SystemStorage t10.ATA_____<ID>___________________________________<ID>: upgrading partition layout...
Traceback (most recent call last):
File "/bin/initSystemStorage", line 1354, in <module>
storage.setupSystemPartitions()
File "/bin/initSystemStorage", line 659, in setupSystemPartitions
self.upgradePartitionTable(bootDisk)
File "/bin/initSystemStorage", line 413, in upgradePartitionTable
upgradeBackup()
File "/lib64/python3.8/site-packages/systemStorage/upgradeUtils.py", line 307, in upgradeBackup
File "/lib64/python3.8/site-packages/systemStorage/upgradeUtils.py", line 201, in calculateDirMiBSize
File "/lib64/python3.8/genericpath.py", line 50, in getsize
FileNotFoundError: [Errno 2] No such file or directory: '/vmfs/volumes/5d031f44-########-####-########/log/\x03\x05\x03\x01yd\x1fy.\udce8g\udcdd'
YYYY-MM-DDTHH:MM:SS.523Z Plugin system-storage failed Invoking method start (rc=1)
Cause for dirty bit - the dirty bit is set by another OS, ESXi doesn't use this bit. The dirty bit indicates that the partition was mounted, but there was no unmount operation.
Other vFAT failures - cause is under investigation.
To resolve the issue, follow below steps to repair the faulty vFAT partitions by using dosfsck.
Each ESXi host has 4 vFAT partition on ESXi 6.5 and ESXi 6.7: 2 Bootbanks, Scratch, and Locker
# esxcli storage filesystem list Mount Point Volume Name UUID Mounted Type Size Free ------------------------------------------------- ----------- ----------------------------------- ------- ------ ------------ ------------ /vmfs/volumes/63fe1b2d-########-####-########46fa datastore1 63fe1b2d-########-####-########46fa true VMFS- 6 129385889792 127599116288 /vmfs/volumes/63fe1b26-########-####-########46fa 63fe1b26-########-####-########46fa true vfat 299712512 108437504 /vmfs/volumes/079b6e7e-########- #### -########2ce5 079b6e7e-########- #### -########2ce5 true vfat 261853184 88797184 /vmfs/volumes/63fe3b74- ######## -####-########46fa 63fe3b74- ######## -####-########46fa true vfat 4293591040 4079943680 /vmfs/volumes/7ad83874-########-####-########05b5 7ad83874-########-####-########05b5 true vfat 261853184 261849088 |
From the mount points, its possible to identify disk and partition
# vmkfstools -P /vmfs/volumes/63fe1b26-#######-####-########46fa vfat-0.04 (Raw Major Version: 0) file system spanning 1 partitions. File system label (if any): Mode: private Capacity 299712512 (36586 file blocks * 8192), 108437504 (13237 blocks) avail, max supported file size 0 Disk Block Size: 512/0/0 UUID: 63fe1b26-fdea7ff3-f520-000c298546fa Partitions spanned (on "disks"): mpx.vmhba0:C0:T0:L0:8 Is Native Snapshot Capable: NO |
The disk and partition id is mpx.vmhba0:C0:T0:L0:8.
Repeat this step for all vFAT partitions. Finally, you will have list like this
Note: This step is only required for upgrades from 6.5 and 6.7.
To avoid any interference between the following steps and any daemon writing on the disk, its required to check for open file handles and close them.
Check for further daemons having open file handles on the scratch partition and stop these daemons
# lsof |grep scratch 1001391762 vmfstracegd FILE 4 /scratch/vmfstraces/vmfsGlobalTrace.trace. 0 .gz # /etc/init.d/vmfstraced stop watchdog-vmfstracegd: Terminating watchdog process with PID 1001391748 vmfstracegd stopped [root @localhost :~] lsof |grep scratch -- note: 63fe3b74- 53874442 -####-########6fa is the UUID of the scratch partition # lsof |grep 63fe3b74- 53874442 -####-########46fa 1001391489 rhttpproxy FILE 18 /vmfs/volumes/63fe3b74- ######## -####-#######46fa/log/rhttpproxy- 1001391489 -000000db02450060-lo0- 1 .pcap 1001391489 rhttpproxy FILE 19 /vmfs/volumes/63fe3b74- ######## -####-#######46fa/log/rhttpproxy- 1001391489 -000000db024501a8-vmk0- 1 .pcap # /etc/init.d/rhttpproxy stop # lsof | grep var/run/log 2101088 python FILE 5 /var/run/log/vsandevicemonitord.log # /etc/init.d/vsandevicemonitord stop |
Perform any of below Solutions to recover the corrupted vFAT partitions.
For all identifies vFAT partitions, check the file system integrity and repair the disk as needed
For instance, the output for a healthy partition
# dosfsck -Vv /dev/disks/mpx.vmhba0\:C0\:T0\:L0:2 dosfsck 2.11 (12 Mar 2005) dosfsck 2.11, 12 Mar 2005, FAT32, LFN Checking we can access the last sector of the filesystem Boot sector contents: System ID "MSDOS5.0" Media byte 0xf8 (hard disk) 512 bytes per logical sector 65536 bytes per cluster 2 reserved sectors First FAT starts at byte 1024 (sector 2) 2 FATs, 16 bit entries 131072 bytes per FAT (= 256 sectors) Root directory starts at byte 263168 (sector 514) 512 root directory entries Data area starts at byte 279552 (sector 546) 65515 data clusters (4293591040 bytes) 32 sectors/track, 64 heads 0 hidden sectors 8386560 sectors total Starting check/repair pass. Checking for unused clusters. Starting verification pass. Checking for unused clusters. /dev/disks/mpx.vmhba0:C0:T0:L0:2: 222 files, 3279/65515 clusters |
Backup all files. In this example, we will backup /scratch and keep a copy on datastore1
# cp /scratch/ /vmfs/volumes/datastore1/scratchBackup
(At this point its very likely that the cp command returns a failure. Note, the filesystem is corrupted and one or more files or filenames will be invalid. A this point copy folder by folder or file by file and leave the corrupted files on the disk. After re-formatting, the file will be lost!)
(Re-)Format the corrupted partition
# vmkfstools -C vfat /dev/disks/mpx.vmhba0:C0:T0:L0: 2 create fs deviceName: '/dev/disks/mpx.vmhba0:C0:T0:L0:2' , fsShortName: 'vfat' , fsName: '(null)' deviceFullPath:/dev/disks/mpx.vmhba0:C0:T0:L0: 2 deviceFile:mpx.vmhba0:C0:T0:L0: 2 Checking if remote hosts are using this device as a valid file system. This may take a few seconds... Creating vfat file system on "mpx.vmhba0:C0:T0:L0:2" with blockSize 1048576 and volume label "none" . Successfully created new volume: 640748a7-########-####-########46fa |
(Note: If the command returns a busy error, this indicates that a file on this disk is still open. See above steps to identify the open handles.)
Restore the content
For all identified vFAT partitions, check the file system integrity and repair the disk as needed
If following messages are prompted after running command, choose 'No action'.
# dosfsck -Vv /dev/disks/<disk and partition id>
0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt. 1) Remove dirty bit 2) No action [12?1] |
Partition will be repaired by command below
# dosfsck -a -w /dev/disks/<disk and partition id>.
Note: This issue has been permanently fixed in the VMware ESXi 8.0 Update 3b.
https://docs.vmware.com/en/VMware-vSphere/8.0/rn/vsphere-esxi-80u3b-release-notes/index.html#Release-Note-Section-19016