Greenplum segment fails to start with cpu.cfs_quota_us: Invalid argument due to cgroup inconsistency
search cancel

Greenplum segment fails to start with cpu.cfs_quota_us: Invalid argument due to cgroup inconsistency

book

Article ID: 433148

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

In a Greenplum cluster configured with resource groups, a primary or mirror segment may fail to start, leading to cluster unavailability or inability of the mirror to promote.

During segment startup, the following error may be observed in the segment logs:

FATAL: can't write data to file '/sys/fs/cgroup/cpu/gpdb/.../cpu.cfs_quota_us': Invalid argument
(resgroup-ops-linux.c)
As a result:
  • The segment process exits during initialization
  • Mirror promotion fails
  • Queries may fail with errors such as:
    failed to acquire resources on one or more segments
  • The cluster may become partially or fully unavailable

This issue typically occurs during Greenplum resource group initialization, which depends on the Linux cgroup CPU controller.

Cause

This issue occurs when the Linux cgroup CPU controller enters an inconsistent or invalid state on the affected host.

Greenplum resource groups rely on writing CPU quota parameters such as cpu.cfs_quota_us under:

/sys/fs/cgroup/cpu/gpdb/
If the Linux kernel rejects these writes with an Invalid argument error, it indicates that the cgroup hierarchy or CPU controller is not functioning correctly.

Since this failure occurs during resource group initialization, the segment cannot start and mirror failover cannot proceed.

Resolution

Workaround: 

Reboot the affected host to reset the Linux cgroup subsystem and reinitialize the CPU controller state.

After the reboot:

  1. Start the Greenplum cluster:

     
    gpstart -a
  2. Recover the failed segment if required:

     
    gprecoverseg -a

 

Alternative Temporary Workaround:

If an immediate reboot is not feasible, switch from resource groups to resource queues to bypass cgroup usage temporarily.

  1. Disable resource groups:

     
    gpconfig -c gp_resource_manager -v queue
  2. Restart the Greenplum cluster:

     
    gpstop -ar
  3. Recover the failed segment:

     
    gprecoverseg -a

Note: This is a temporary workaround only, as it disables resource group functionality.

For details regarding resource group, check - Greenplum Resource Groups Documentation.