"Configstore ramdisk is full" alerts on ESX host.

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
Due to a bug in ConfigStore API, stale data related to block devices might not be deleted in time from the ESXi ConfigStore database and causing an out of space condition. As a result, write operations to ConfigStore start to fail. In the backtrace, you see logs such as:

2022-12-19T03:51:42.733Z cpu53:26745174)WARNING: VisorFSRam: 203: Cannot extend visorfs file /etc/vmware/configstore/current-store-1-journal because its ramdisk (configstore) is full.

Below are the symptoms observed in the vSphere 8.0 U3 versions
Follow the steps under the resolution section for vSphere 8.0 Update 3 mentioned in this KB

1. In the /var/log/vobd.log file you see below entries

- When configstore database is 80% disk space:

vobd[1000079502]:  [VisorfsCorrelator] 249872232435us: [vob.visorfs.ramdisk.usage.warning] Ramdisk 'configstore' usage is very high. Approx 20% space left.
vobd[1000079502]:  [VisorfsCorrelator] 249870515527us: [esx.problem.visorfs.configstore.usage.warning] Ramdisk 'configstore' usage is very high. Approx 20% space left. Please refer to the KB 93362 for more details.
vobd[1000079502]:  [VisorfsCorrelator] 249872241130us: [vob.visorfs.ramdisk.usage.warning] Ramdisk 'configstore' usage is very high. Approx 20% space left.

- When configstore database is about to reach full disk space:
vobd[1000079502]:  [VisorfsCorrelator] 249873490366us: [vob.visorfs.ramdisk.usage.error] Ramdisk 'configstore' is reaching its critical size limit. Approx 10% space left.
vobd[1000079502]:  [VisorfsCorrelator] 249871773438us: [esx.problem.visorfs.configstore.usage.error] Ramdisk 'configstore' is reaching its critical size limit. Approx 10% space left. Please refer to the KB 93362 for more details.
vobd[1000079502]:  [VisorfsCorrelator] 249873497673us: [vob.visorfs.ramdisk.usage.error] Ramdisk 'configstore' is reaching its critical size limit. Approx 10% space left.

2. In vCenter Server an events similar to

- A warning will get display when configstore database is 80% disk space:
Ramdisk 'configstore' usage is very high. Approx 20% space left. Please refer to the KB 93362 for more details.

- An error will get display when configstore database is about to reach full disk space:
Ramdisk 'configstore' is reaching its critical size limit. Approx 10% space left. Please refer to the KB 93362 for more details.

Environment

VMware vSphere ESXi 8.0.1
VMware vSphere ESXi 8.0.0
VMware vSphere ESXi 7.0.3

Cause

ConfigStore SetVitalDataInstances API sets vital configuration by overwriting existing data, sometimes this leads to empty rows in configstore database (stale data).

Resolution

The issue is fixed in the below releases

vSphere 7.0.3 P08,

vSphere 8.0.1 P02, 8.0.2

vSphere 8.0 Update 3

Workaround:

Follow the below mentioned steps.
Option 1:
Use configstore-recovery python script attached to the KB article. Copy the script to the host and run python configstore-recovery

The script performs following steps:
1. Temporarily increase the configstore ramdisk size to 64MB (initial size is 32MB)
2. Clean stale/empty data from the configstore DB.
3. Perform VACUUM on the configstore DB. The VACUUM command rebuilds the database file, repacking it into a minimal amount of disk space.
4. Revert configstore ramdisk size to 32MB

Logs from the script are captured in /var/run/log/syslog.log

Option 2:
Manually recover the host by following the below steps:

1. Temporarily increase the configstore ramdisk size to 64MB (initial size is 32MB).
a. Get configstore ramdisk group ID using:

vsish -e set /sched/groupPathNameToID host system visorfs ramdisks configstore

b. Set configstore ramdisk max memory to 64 using:

vsish -e set /sched/groups/<GID>/memAllocationInMB max=64

c. Verify configstore ramdisk memory allocation using:

vsish -e get /sched/groups/<GID>/memAllocationInMB

Example:
[root@hostname:~] vsish -e set /sched/groupPathNameToID host system visorfs ramdisks configstore
1627
[root@hostname:~]
[root@hostname:~] vsish -e get /sched/groups/1627/memAllocationInMB
memsched-allocation {
min:32
max:32
shares:-3
minLimit:-1
units: 4 -> mb
}
[root@hostname:~] vsish -e set /sched/groups/1627/memAllocationInMB max=64
[root@hostname:~]
[root@hostname:~] vsish -e get /sched/groups/1627/memAllocationInMB
memsched-allocation {
min:32
max:64
shares:-3
minLimit:-1
units: 4 -> mb
}
[root@hostname:~]

2. Forcefully purge any stale device entries currently on the host. This gives a chance to purge any recently unmapped devices (< 7days) to get purged

esxcli storage core device purge -f

3. Delete 'esx/storage/devices_access' configuration using configstorecli

configstorecli config current delete -c esx -g storage -k devices_access --all

4. Reboot the host (do not force reboot).

reboot

Fix in vSphere 8.0 Update 3 :-

To resolve this issue on 8.0 Update 3 release run configstore-recovery tool which is available by default.

[root@hostname:~] /usr/lib/vmware/configmanager/tools/configstore-recovery --recover

Attachments

configstore-recovery get_app