In a Greenplum cluster configured with resource groups, a primary or mirror segment may fail to start, leading to cluster unavailability or inability of the mirror to promote.
During segment startup, the following error may be observed in the segment logs:
FATAL: can't write data to file '/sys/fs/cgroup/cpu/gpdb/.../cpu.cfs_quota_us': Invalid argument
(resgroup-ops-linux.c)failed to acquire resources on one or more segmentsThis issue typically occurs during Greenplum resource group initialization, which depends on the Linux cgroup CPU controller.
This issue occurs when the Linux cgroup CPU controller enters an inconsistent or invalid state on the affected host.
Greenplum resource groups rely on writing CPU quota parameters such as cpu.cfs_quota_us under:
/sys/fs/cgroup/cpu/gpdb/Invalid argument error, it indicates that the cgroup hierarchy or CPU controller is not functioning correctly.Since this failure occurs during resource group initialization, the segment cannot start and mirror failover cannot proceed.
Workaround:
Reboot the affected host to reset the Linux cgroup subsystem and reinitialize the CPU controller state.
After the reboot:
Start the Greenplum cluster:
gpstart -aRecover the failed segment if required:
gprecoverseg -a
Alternative Temporary Workaround:
If an immediate reboot is not feasible, switch from resource groups to resource queues to bypass cgroup usage temporarily.
Disable resource groups:
gpconfig -c gp_resource_manager -v queueRestart the Greenplum cluster:
gpstop -arRecover the failed segment:
gprecoverseg -aNote: This is a temporary workaround only, as it disables resource group functionality.
For details regarding resource group, check - Greenplum Resource Groups Documentation.