Upgrading to TAS/TPCF fails with rep failing with cpu.cfs_burst_us: operation not permitted
search cancel

Upgrading to TAS/TPCF fails with rep failing with cpu.cfs_burst_us: operation not permitted

book

Article ID: 419964

calendar_today

Updated On:

Products

VMware Tanzu Platform - Cloud Foundry

Issue/Introduction

 

Upgrading TAS/TPCF tile fails with rep job:

Task 925514 | 10:44:11 | L executing pre-start: diego_cell/6e88c99a-####-#### (1) (canary)
Task 925514 | 10:46:08 | L starting jobs: diego_cell/6e88c99a-####-#### (1) (canary) (00:07:54)
                       L Error: 'diego_cell/6e88c99a-####-#### (1)' is not running after update. Review logs for failed jobs: rep
Updating deployment:
  Expected task '925514' to succeed but state is 'error'
Task 925514 | 10:51:10 | Error: 'diego_cell/6e88c99a-####-#### (1)' is not running after update. Review logs for failed jobs: rep

 

Checking into the diego_cell/6e88c99a-####-#### logs:

rep contain error logs:

rep.stdout.log:{"timestamp":"2025-11-18T10:41:30.480214270Z","level":"error","source":"rep","message":"rep.failed-to-initialize-executor","data":{"error":"1 error occurred:\n\t* unlinkat /sys/fs/cgroup/cpu/system.slice/garden.service/garden/206f92e4-e82a-4c30-400e-05c9/cpu.cfs_burst_us: operation not permitted\n\n"}}
rep.stdout.log:{"timestamp":"2025-11-18T10:42:01.549142715Z","level":"error","source":"rep","message":"rep.executor-failed-to-destroy-container","data":{"error":"1 error occurred:\n\t* unlinkat /sys/fs/cgroup/cpu/system.slice/garden.service/garden/e018035c-####-####cpu.cfs_burst_us: operation not permitted\n\n","handle":"e018035c-####-####"}}

garden contain error logs:

garden/garden.stdout.log.1:{"timestamp":"2025-11-18T10:46:24.527514920Z","level":"error","source":"guardian","message":"guardian.start.clean-up-container.external-networker-result","data":{"action":"down","error":"exit status 1","handle":"e018035c-####-####","session":"4.2","stderr":"cfnetworking: cni down: del network failed: plugin type=\"cni-wrapper-plugin\" failed (delete): Get \"http://127.0.0.1:8722/force-orphaned-asgs-cleanup?container=7e018035c-####-####\": dial tcp 127.0.0.1:8722: connect: connection refused\n","stdin":"null","stdout":""}}

...
garden/garden.stdout.log.1:{"timestamp":"2025-11-18T10:46:24.543692183Z","level":"error","source":"guardian","message":"guardian.start.clean-up-container.failed attempt 1","data":{"error":"2 errors occurred:\n\t* unlinkat /sys/fs/cgroup/cpu/system.slice/garden.service/garden/e018035c-####-####/cpu.cfs_burst_us: operation not permitted\n\t* external networker encountered an error running 'down' action: exit status 1\n\n","handle":"e018035c-####-####","session":"4.2"}}

Environment

TAS/TPCF version 6.x
No NSX Container Plugin installed

Cause

If there is no stemcell upgrade:

Since the stemcell version did not change, the kernel did not introduce any new cgroup features. Therefore, we can rule out kernel–Garden incompatibility as the cause.

The other possible reason is due to stale cgroup directories present in the cell. These could happen due to several reasons.

  • hard vm reboot
  • OOM - incomplete container cleanup
  • stemcell upgrades be performed without draining all containers.

    The leftover ones leave the residual cgroup files and kernel marks them as read-only. During upgrade, garden comes up and it fails to clean them up.

errors occurred:\n\t* unlinkat /sys/fs/cgroup/cpu/system.slice/garden.service/garden/e018035c-####-####/cpu.cfs_burst_us: operation not permitted
In the above error, container with handle e018035c-####-#### could be a stale one and garden aborts startup when cleanup fails.

Creating new vms, provide clean file system (/sys/fs/cgroup/* starts clean) and garden can come up without any errors.

If there is a stemcell upgrade:

The garden and other network components are compatible with the older stemcell version. In this case, vm recreation is required to pick up the newer versions of stemcell during recreating a VM

Resolution

This issue can be resolved by:

1.) Under bosh director tile -> Director Config -> Enable Recreate All VMS

2.) Continue to Upgrade TPCF/TAS