Host go into not responding state due to storage connectivity issues Error: Failed to reserve space for journal

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms :

None of the commands were executed from Esxi host when the affected LUN is presented to esxi host
Host goes into not responding state often

Validation Steps :

Below Erros in vmkernel.log

Log Path : less /var/run/log/vmkernel.log..

[YYYY-MM-DDTHH:MM:SS Z] Wa(180) vmkwarning: cpu73:2099594)WARNING: Fil3: 1638: Failed to reserve volume f532 28 1 667e10d2 6a7f0159c7f4410eb45034aa 0 0 0 0 0 0 0
[YYYY-MM-DDTHH:MM:SS Z] Wa(180) vmkwarning: cpu73:2099594)WARNING: FS3J: 1811: Failed to reserve space for journal on 667e10d2-6a7f0159-410e-###### : Timeout
[YYYY-MM-DDTHH:MM:SS Z] Wa(180) vmkwarning: cpu73:2099594)WARNING: Fil3: 1638: Failed to reserve volume f532 28 1 667e10d2 6a7f0159 ######### 0 0 0 0 0 0 0
[YYYY-MM-DDTHH:MM:SS Z] Wa(180) vmkwarning: cpu62:2097342)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:235: NMP device "naa.600##################" state in doubt; requested fast path state update...
[YYYY-MM-DDTHH:MM:SS Z] In(182) vmkernel: cpu2:2098461)ScsiDeviceIO: 4672: Cmd(0x45ba02754280) 0x28, CmdSN 0x56 from world 2099547 to dev naa.600#########" failed H:0x7 D:0x0 P:0x0
[YYYY-MM-DDTHH:MM:SS Z] In(182) vmkernel: cpu0:2097325)ScsiDeviceIO: 4672: Cmd(0x45ba02697880) 0x28, CmdSN 0x2f from world 2099547 to dev"naa.600#############"failed H:0x7 D:0x0 P:0x0
[YYYY-MM-DDTHH:MM:SS Z] In(182) vmkernel: cpu58:2098178)NMP: nmp_ResetDeviceLogThrottling:3854: last error status from device naa.600########## repeated 1 times

Environment

VMware vSphere 8.x

VMware vSphere 7.x

Cause

The journal block leaks happen on VMFS Filesystem in the case of storage connectivity problems while opening/closing volume. When its trying to open a volume we see its failing to reserve the journal space

The journal block is a crucial part of the transactional consistency mechanism in VMFS. When a volume is accessed or modified, VMFS uses journal blocks to maintain consistency in case of crashes or power loss. If there are issues during the opening or closing of a volume (such as storage device disconnections or delays), VMFS may fail to reserve the necessary journal blocks to ensure data consistency

During the volume mount or dismount process, VMFS will attempt to reserve journal space to ensure that metadata changes are properly recorded and can be recovered. If the storage connectivity is disrupted or the system cannot access the underlying disk blocks for journal reservation, the journal space cannot be properly allocated, leading to "leaks" or untracked space

Cause Validation

Run VOMA check using KB : 318894 to see any corruption occurred for affected datastore

root@hostname:/vmfs/volumes/63997788-#####-1f3d-#######/log] voma -m vmfs -f check -d /vmfs/devices/disks/naa.600###############
Running VMFS Checker version 2.1 in check mode
Initializing LVM metadata, Basic Checks will be done
Detected valid GPT signatures
Number Start End Type
1 2048 34359738334 vmfs

Checking for filesystem activity
Performing filesystem liveness check..-Scanning for VMFS-6 host activity (4096 bytes/HB, 1024 HBs).
     ERROR: Failed to check for heartbeating hosts on device '/vmfs/devices/disks/naa.600#############' >>>>>>>>> Indicates failed to check heartbeat due to storage connectivity issues
VOMA failed to check device : General Error

Total Errors Found: 0
Kindly Consult VMware Support for further assistance

Resolution

To Isolate the issue Unmap LUN which was showing as failed to open volume from all the esxi host and check the host status and hosts were stable.

Present a LUN and check whether the host is behaving slow and not responding. If yes further checks has to be done from Storage. Coordinate with the storage vendor to identify any potential issues with the underlying storage infrastructure, including SAN devices that might be causing connectivity disruptions. Storage controllers, network switches, or disk arrays may be contributing to intermittent connectivity or I/O performance degradation.