When operating VMware Identity Manager, you experience a cluster disruption and observe the following symptoms:
The UI displays the error: Failed to fetch identity providers. Identity Internal Server Error
A disk full event occurs on the primary node.
The /etc/hosts file becomes unreadable or appears to be nulled out.
Administrative attempts to increase the disk space in vCenter and run resizefs result in an error.
Upon rebooting the appliance (e.g., during a scheduled CSP patch), the system experiences a kernel panic and drops into a dracut emergency shell.
VMware Identity Manager 3.3.7
This issue originates from the PostgreSQL high-availability monitoring stack, which generates rollover logs in 51MB increments (/var/log/pgService/auto-recovery.log.*). These files over-retain and slowly exhaust the strictly bounded 12GB / partition (sda4).
The subsequent cluster failure and bootloader mismatch are caused by manual intervention attempts:
Filesystem Unmapping: A manual fdisk operation executed on the guest OS to address the space exhaustion writes incorrect (identical) start and end blocks to the partition table. This collapses the sda4 logical boundary and unmaps the ext4 filesystem. Consequently, core networking files like /etc/hosts cannot be read, which breaks Pgpool-II quorum and application routing.
Bootloader Desynchronization: Reconstructing the partition table generates a new Partition UUID (PARTUUID). During a subsequent reboot, the GRUB bootloader passes the old, invalid PARTUUID to the kernel, resulting in a kernel panic and a dracut emergency shell.
To resolve the partition mapping and correct the bootloader, perform the following steps:
You have access to the vSphere Web Console for the impacted appliance with Remote Console / VMRC.
Recreate the sda4 partition via fdisk utilizing the exact sector locations that represent the beginning and end of the disk.
ext4 filesystem and the intact /etc/hosts file, allowing the Pgpool-II cluster to recover natively:Delete the malformed 0KB partition (if you created it previously):
d4Recreate partition 4 with the precise sector locations for a default installation (12GB for 8:4 / partition):
np (Primary)42123980846405631CRITICAL: If prompted to remove the ext4 signature, answer N.
Reclaim space on the restored /dev/sda4 root partition:
Isolate files larger than 15M:
find / -xdev -type f -size +15M -exec ls -lh {} \;
If numerous *.backup files exist for /var/log/pgService/auto-recovery.log.*, remove them with the following command:
rm -f /var/log/pgService/auto-recovery.log.1-*
Root should now have enough space; if not, /opt/vmware/opensearch/logs/gc.log.## are safe to delete also.
Prevent Kernel panics by modifying the GRUB UUIDs for the new partition:
cp -p /boot/grub2/grub.cfg /tmp
Extract the newly generated PARTUUID from the repaired partition:
NEW_PARTUUID=$(blkid -s PARTUUID -o value /dev/sda4)
Dynamically find the old PARTUUID in the GRUB config and replace it with the new one:
sed -i "s/set rootpartition=PARTUUID=[a-zA-Z0-9-]\+/set rootpartition=PARTUUID=${NEW_PARTUUID}/g" /boot/grub2/grub.cfg
Verify the configuration file now reflects the correct, new UUID:
grep "rootpartition=PARTUUID=" /boot/grub2/grub.cfg
Rebuild the initial ramdisk to ensure the boot environment recognizes the block changes:
dracut --force
Restart the appliance and confirm boot operations complete without entering dracut shell.