vSAN File Service health status becomes critical after the deletion of large files, Root file system not responsive. Cannot fetch file system.

Products

VMware vSAN

Issue/Introduction

vSAN file share usage is around 95%, to free up the space, deleted some large files from file server
After that Skyline Health showing a red error for Root File System on all hosts, and yellow for Workload Balance. Description reads "Root file system not responsive. Cannot fetch file system"
Skyline Health - File Share Health impacted with "No backing vSAN object found for this share, or the VDFS daemon for this file share is not working."

vSAN file Service nodes all with red error message "Synnex VM Power Off". Power On and Off options grayed out.

In var/run/log/vdfsd-server.log you will see the below events

2025-04-18T11:55:40.674Z|f-6-000000002|LOGALLOC: 55c84f67-1237-cb7f-900e-34800dfc6a2c: Retrieving reservations to be released
2025-04-18T11:55:40.674Z|f-6-000000002|LOGALLOC: 55c84f67-1237-cb7f-900e-34800dfc6a2c: Succeeded Retrieving reservations
2025-04-18T11:55:40.674Z|f-6-000000002|SERVER: 55c84f67-1237-cb7f-900e-34800dfc6a2c: rootBlk:0 for RefCntTree deleted.
2025-04-18T11:55:40.674Z|f-5-000000012|SERVER: 55c84f67-1237-cb7f-900e-34800dfc6a2c: [GC] Started [interval=120 sec, unsa
feDelay=300 sec]
2025-04-18T11:55:40.674Z|f-6-000000002|SERVER: 55c84f67-1237-cb7f-900e-34800dfc6a2c: Feature bits: 0x1, refTreeLite: 1, R
dOnly: 0, skipPlog: 0, skipLlog: 0
2025-04-18T11:55:40.674Z|f-6-000000002|SERVER: 55c84f67-1237-cb7f-900e-34800dfc6a2c: state is RUNNING (was INIT).
2025-04-18T11:55:40.675Z|f-6-000000015|SERVER: 55c84f67-1237-cb7f-900e-34800dfc6a2c: Unmapper started
2025-04-18T11:55:40.675Z|f-6-000000002|SERVER: 55c84f67-1237-cb7f-900e-34800dfc6a2c: started -- free space: 264.4 TB

Followed by VDFS in rolling Panic state   
2025-04-18T11:55:40.876Z|f-5-000000003|PANIC: NOT_IMPLEMENTED bora/vdfs/core/VDFSPhysicalLog.cpp:626
2025-04-18T11:55:40.876Z|f-5-000000003|Backtrace:
2025-04-18T11:55:40.876Z|f-5-000000003|Backtrace[0] 0000005b31db6cc0 rip=0000005acb19651f rbx=0000005b31db6cc0 rbp=0000005b31db70f0 r12=0000005accd27788 r13=0000005b31db7108 r14=00000000ffffff2c r15=0000005a8d4a1d30
2025-04-18T11:55:40.876Z|f-5-000000003|Backtrace[1] 0000005b31db7100 rip=0000005acb1965eb rbx=0000005b31db7280 rbp=0000005b31db71d0 r12=0000000000000000 r13=0000000000000005 r14=00000000ffffff2c r15=0000005a8d4a1d30
2025-04-18T11:55:40.876Z|f-5-000000003|Backtrace[2] 0000005b31db71e0 rip=0000005acba6cb3a rbx=0000005b31db7280 rbp=0000005b31db7200 r12=0000000000000000 r13=0000000000000005 r14=00000000ffffff2c r15=0000005a8d4a1d30
2025-04-18T11:55:40.876Z|f-5-000000003|Backtrace[3] 0000005b31db7210 rip=0000005acba5fc6a rbx=0000000000000000 rbp=0000005b31db78f0 r12=0000005b31db7280 r13=0000005b31db76b0 r14=0000000000000519 r15=0000005a8d4a1d30
2025-04-18T11:55:40.876Z|f-5-000000003|Backtrace[4] 0000005b31db7900 rip=0000005acba60019 rbx=0000005a8d4a1d30 rbp=0000005b31db7a20 r12=0000005a8d4a1d78 r13=0000005a8d0ec300 r14=0000000000000000 r15=0000005b31db7960
2025-04-18T11:55:40.876Z|f-5-000000003|Backtrace[5] 0000005b31db7a30 rip=0000005acba40597 rbx=0000005a8d0ec180 rbp=0000005b31db7a60 r12=004b2f9500824cc3 r13=0000005a8d0ec300 r14=0000000000000000 r15=0000000000000000
2025-04-18T11:55:40.876Z|f-5-000000003|Backtrace[6] 0000005b31db7a70 rip=0000005acba4094c rbx=0000005a8d0ec180 rbp=0000005b31db7a80 r12=004b2f9500824cc3 r13=00000000000000a1 r14=0000005b31db7b90 r15=00000000001a0000
2025-04-18T11:55:40.876Z|f-5-000000003|Backtrace[7] 0000005b31db7a90 rip=0000005acba37742 rbx=0000005a8d0ec180 rbp=0000005b31db7b50 r12=004b2f9500824cc3 r13=00000000000000a1 r14=0000005b31db7b90 r15=00000000001a0000
2025-04-18T11:55:40.876Z|f-5-000000003|Backtrace[8] 0000005b31db7b60 rip=0000005acba37ec1 rbx=0000005a8d0ec180 rbp=0000005b31db7bf0 r12=0000005a8d0ed100 r13=0000000000000000 r14=0000005b31db7b90 r15=0000005b31db7c50
2025-04-18T11:55:40.876Z|f-5-000000003|Backtrace[9] 0000005b31db7c00 rip=0000005acba3a2f6 rbx=0000005a8d0ec180 rbp=0000005b31db7cd0 r12=0000005a8d0ed100 r13=0000005a8d0ec120 r14=0000005b31db7c50 r15=0000005b31db7c30

Environment

VMware vSAN 7.0

Cause

When the block cache overruns, it breaks the contract of the block cache reservation system. As a result, certain physical transactions may be unable to obtain cache pages, potentially leading to a "cache is full" condition and VDFS daemon goes into rolling panic mode.

Resolution

There is no available workaround. This issue has been resolved in the Esxi 7.0 P07 release.