Upgrading TAS/TPCF tile fails with rep job:
Task 925514 | 10:44:11 | L executing pre-start: diego_cell/6e88c99a-####-#### (1) (canary)
Task 925514 | 10:46:08 | L starting jobs: diego_cell/6e88c99a-####-#### (1) (canary) (00:07:54)
L Error: 'diego_cell/6e88c99a-####-#### (1)' is not running after update. Review logs for failed jobs: rep
Updating deployment:
Expected task '925514' to succeed but state is 'error'
Task 925514 | 10:51:10 | Error: 'diego_cell/6e88c99a-####-#### (1)' is not running after update. Review logs for failed jobs: rep
Checking into the diego_cell/6e88c99a-####-#### logs:
rep contain error logs:
rep.stdout.log:{"timestamp":"2025-11-18T10:41:30.480214270Z","level":"error","source":"rep","message":"rep.failed-to-initialize-executor","data":{"error":"1 error occurred:\n\t* unlinkat /sys/fs/cgroup/cpu/system.slice/garden.service/garden/206f92e4-e82a-4c30-400e-05c9/cpu.cfs_burst_us: operation not permitted\n\n"}}
rep.stdout.log:{"timestamp":"2025-11-18T10:42:01.549142715Z","level":"error","source":"rep","message":"rep.executor-failed-to-destroy-container","data":{"error":"1 error occurred:\n\t* unlinkat /sys/fs/cgroup/cpu/system.slice/garden.service/garden/e018035c-####-####cpu.cfs_burst_us: operation not permitted\n\n","handle":"e018035c-####-####"}}
garden contain error logs:
garden/garden.stdout.log.1:{"timestamp":"2025-11-18T10:46:24.527514920Z","level":"error","source":"guardian","message":"guardian.start.clean-up-container.external-networker-result","data":{"action":"down","error":"exit status 1","handle":"e018035c-####-####","session":"4.2","stderr":"cfnetworking: cni down: del network failed: plugin type=\"cni-wrapper-plugin\" failed (delete): Get \"http://127.0.0.1:8722/force-orphaned-asgs-cleanup?container=7e018035c-####-####\": dial tcp 127.0.0.1:8722: connect: connection refused\n","stdin":"null","stdout":""}}
...
garden/garden.stdout.log.1:{"timestamp":"2025-11-18T10:46:24.543692183Z","level":"error","source":"guardian","message":"guardian.start.clean-up-container.failed attempt 1","data":{"error":"2 errors occurred:\n\t* unlinkat /sys/fs/cgroup/cpu/system.slice/garden.service/garden/e018035c-####-####/cpu.cfs_burst_us: operation not permitted\n\t* external networker encountered an error running 'down' action: exit status 1\n\n","handle":"e018035c-####-####","session":"4.2"}}
TAS/TPCF version 6.x
No NSX Container Plugin installed
If there is no stemcell upgrade:
Since the stemcell version did not change, the kernel did not introduce any new cgroup features. Therefore, we can rule out kernel–Garden incompatibility as the cause.
The other possible reason is due to stale cgroup directories present in the cell. These could happen due to several reasons.
errors occurred:\n\t* unlinkat /sys/fs/cgroup/cpu/system.slice/garden.service/garden/e018035c-####-####/cpu.cfs_burst_us: operation not permitted
In the above error, container with handle e018035c-####-#### could be a stale one and garden aborts startup when cleanup fails.
Creating new vms, provide clean file system (/sys/fs/cgroup/* starts clean) and garden can come up without any errors.
If there is a stemcell upgrade:
The garden and other network components are compatible with the older stemcell version. In this case, vm recreation is required to pick up the newer versions of stemcell during recreating a VM
This issue can be resolved by:
1.) Under bosh director tile -> Director Config -> Enable Recreate All VMS
2.) Continue to Upgrade TPCF/TAS