CF app deployment hangs in Active CANCELING status
search cancel

CF app deployment hangs in Active CANCELING status

book

Article ID: 396187

calendar_today

Updated On:

Products

VMware Tanzu Platform

Issue/Introduction

An app deployment is stuck as ACTIVE status for a long period of time - 

$ cf app <app-name>

...

Active deployment with status CANCELING (since Thu 01 May 15:09:18 CDT 2025)
strategy:        rolling
max-in-flight:   1

Cloud Controller / Errands return failure -

Cannot scale this process while a deployment is in flight

cf cancel-deployment and cf restart do not result in any change to the deployment status.

This may affect Spring config-server app which utilizes app deployments for service instance updates.

Reference documentation for CF app deployments - https://techdocs.broadcom.com/us/en/vmware-tanzu/platform/tanzu-platform-for-cloud-foundry/6-0/tpcf/deploy-apps-rolling-deploy.html

Note: Use CF CLI version 7.0+ when dealing with app deployment functionality. (verify CLI version by cf -v)

Resolution

In order to identify why the app deployment is stuck, check the cc_deployment_updater logs. This is on clock_global (full CF) or control (for SRT) VM.

bosh -d <CF deployment> ssh clock_global/0

cd /var/vcap/sys/log/cc_deployment_updater.log
cc_deployment_updater.log

{"timestamp":"2025-04-29T08:09:04.863743727Z","message":"Failed to acquire lock 'cc-deployment-updater' for owner '6d8f66fe-1ebf-4826-977b-82b78aa3524e': 14:recvmsg:Connection timed out. debug_error_string:{UNKNOWN:Error received from peer  {created_time:\"2025-04-29T08:09:04.863243714+00:00\", grpc_status:14, grpc_message:\"recvmsg:Connection timed out\"}}","log_level":"info","source":"cc.locket-client","data":{},"thread_id":38820,"fiber_id":38840,"process_id":6,"file":"/var/vcap/data/packages/cloud_controller_ng/49b1e1127c71d4423d2cea286579036628a76b23/cloud_controller_ng/lib/locket/lock_runner.rb","lineno":39,"method":"rescue in block (2 levels) in start"}

The above log message indicates that there was a timeout to the locket processes which resides on Diego Database. For this example, it was necessary to scale the Diego database which was overloaded. There many be other issues that can result in cc_deployment_update getting hung up.

After fixing the underlying issue identified in cc_deployment_updater.log, then perform a restart of cc_deployment_updater -

monit restart cc_deployment_updater

After performing this restart, app deployments should start processing again

There are future enhancements planned to cc_deployment_updater so that error is exposed and to make this process more resilient to connection errors.