You might observe timeout errors when job_queue_length metric is increasing.
{"timestamp":"2026-03-12T16:42:36.760496288Z","message":"Request failed: 524: {\"description\"=\u003e\"The job execution has timed out.\", \"error_code\"=\u003e\"CF-JobTimeout\", \"code\"=\u003e290006, \"test_mode_info\"=\u003e{\"description\"=\u003e\"The job execution has timed out.\", \"error_code\"=\u003e\"CF-JobTimeout\", \"backtrace\"=\u003e[\"/var/vcap/data/packages/cloud_controller_ng/GUID/cloud_controller_ng/app/jobs/timeout_job.rb:18:in `rescue in perform'\"
The example below shows cloud controller worker starting a BlobstoreDelete job and it eventually fails 4 hours later. The default job timeout for cloud controller worker is 4 hours.
{"timestamp":"2026-02-27T05:36:23.843405396Z","message":"2026-02-27T05:36:23+0000: [Worker(cc_global_worker.cloud_controller_worker.2.1)] Job VCAP::CloudController::Jobs::Runtime::BlobstoreDelete (id=72451960) (queue=cc-generic) RUNNING","log_level":"info","source":"cc-worker","data":{"worker_name":"cc_global_worker.cloud_controller_worker.2.1"},"thread_id":14340,"fiber_id":14360,"process_id":6,"file":"/var/vcap/data/packages/cloud_controller_ng/GUID/gem_home/ruby/3.2.0/gems/delayed_job-4.1.13/lib/delayed/worker.rb","lineno":286,"method":"say"}
{"timestamp":"2026-02-27T09:36:23.942161756Z","message":"2026-02-27T09:36:23+0000: [Worker(cc_global_worker.cloud_controller_worker.2.1)] Job VCAP::CloudController::Jobs::Runtime::BlobstoreDelete (id=72451960) (queue=cc-generic) FAILED (0 prior attempts) with CloudController::Errors::ApiError: The job execution has timed out.","log_level":"error","source":"cc-worker","data":{"worker_name":"cc_global_worker.cloud_controller_worker.2.1"},"thread_id":14340,"fiber_id":14360,"process_id":6,"file":"/var/vcap/data/packages/cloud_controller_ng/GUID/gem_home/ruby/3.2.0/gems/delayed_job-4.1.13/lib/delayed/worker.rb","lineno":286,"method":"say"}
after 4 hours the job will be retried and in this case was successful
{"timestamp":"2026-02-27T09:36:29.065131358Z","message":"2026-02-27T09:36:29+0000: [Worker(cc_global_worker.cloud_controller_worker.1.1)] Job VCAP::CloudController::Jobs::Runtime::BlobstoreDelete (id=72451960) (queue=cc-generic) RUNNING","log_level":"info","source":"cc-worker","data":{"worker_name":"cc_global_worker.cloud_controller_worker.1.1"},"thread_id":14340,"fiber_id":14360,"process_id":6,"file":"/var/vcap/data/packages/cloud_controller_ng/GUID/gem_home/ruby/3.2.0/gems/delayed_job-4.1.13/lib/delayed/worker.rb","lineno":286,"method":"say"}
{"timestamp":"2026-02-27T09:36:29.268478202Z","message":"2026-02-27T09:36:29+0000: [Worker(cc_global_worker.cloud_controller_worker.1.1)] Job VCAP::CloudController::Jobs::Runtime::BlobstoreDelete (id=72451960) (queue=cc-generic) COMPLETED after 0.2030","log_level":"info","source":"cc-worker","data":{"worker_name":"cc_global_worker.cloud_controller_worker.1.1"},"thread_id":14340,"fiber_id":14360,"process_id":6,"file":"/var/vcap/data/packages/cloud_controller_ng/GUID/gem_home/ruby/3.2.0/gems/delayed_job-4.1.13/lib/delayed/worker.rb","lineno":286,"method":"say"}
Third Party s3 blobstore
cloud controller worker uses the fog-aws client which does not enable TCP keepalives when creating a socket with the external s3 blobstore. This problem occurs when the cloud controller worker successfully opens a TCP socket with s3 and sends a HTTP request. While waiting for the HTTP response the external s3 blobstore drops the TCP socket without closing. Given the cloud controller worker does not know the TCP socket is closed it will wait for 4 hours until the job timeout is reached.
You will observe cloud controller worker has a healthy long running TCP session with "ss -ntpo" command. note there are not timers running on this socket.
ESTAB 0 0 CC-WORKER:48722 S3-BLOBSTORE:443 users:(("ruby",pid=509646,fd=26))
This is an example of a tcpsocket with keepalive timer set when cloud controller worker communicates with mysql database.
ESTAB 0 0 CC-WORKER:33966 MYSQL:3306 users:(("ruby",pid=532000,fd=7)) timer:(keepalive,22sec,0)
In a future release of EAR the cloud controller worker will set a keepalive timer on sockets used to connect to external s3 blobstore. This will allow the cloud controller worker to fail and retry faster when there is an external problem.