When using the the Elastic Application Runtime (EART) tile in a Small Footprint deployment (or standard), you may observe the following symptoms:
unresponsive agent, down, or failing when checked via the BOSH CLI.bbs, tps-watcher, locket, and auctioneer frequently flap between running and failing states. /var/vcap/sys/log/locket/locket.stdout.log:{"timestamp":"2026-01-25T02:45:52.327639900Z","level":"error","source":"locket","message":"locket.lock.failed-locking-lock","data":{"error":"context canceled","key":"routing_api_lock","owner":"########-####-####-####-67f034e993eb","request-uuid":"########-####-####-####-444f55c7c6cf","session":"######"}}/var/vcap/sys/log/tps/watcher.stdout.log:{"timestamp":"2026-01-25T02:47:06.263184188Z","level":"error","source":"tps-watcher","message":"tps-watcher.locket-lock.lost-lock","data":{"duration":1110226274,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded","lock":{"key":"tps_watcher","owner":"tps-watcher-########-####-####-####-67f034e993eb","type":"lock","type_code":1},"request-uuid":"########-####-####-####-f0c9f127297a","session":"1","ttl_in_seconds":15}}/var/vcap/sys/log/auctioneer/auctioneer.stdout.log:{"timestamp":"2026-01-25T02:51:23.112453971Z","level":"error","source":"auctioneer","message":"auctioneer.locket-lock.lost-lock","data":{"duration":1014657756,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded","lock":{"key":"auctioneer","owner":"########-####-####-####-67f034e993eb","type":"lock","type_code":1},"request-uuid":"########-####-####-####-013e10ea10a4","session":"2","ttl_in_seconds":15}}/var/vcap/sys/log/bbs/bbs.stdout.log:{"timestamp":"2026-01-25T02:15:22.292830197Z","level":"error","source":"bbs","message":"bbs.locket-lock.failed-to-acquire-lock","data":{"duration":2004939430,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded","lock":{"key":"bbs","owner":"########-####-####-####-67f034e993eb","type":"lock","type_code":1},"request-uuid":"########-####-####-####-2a33af8c13f0","session":"5","ttl_in_seconds":15}}bosh recreate does not permanently resolve the issue.Lock_time (greater than 15) or InnoDB_rec_lock_wait exceeding 15–20 seconds for the locket schema./var/vcap/sys/log/pxc-mysql/mysql_slow_query logunresponsive agent status. This is an indication of CPU or Memory starvation in the vSphere Resource Pool the VMs are deployed under.This problem is independent of version. The symptoms are caused by environmental factors.
The instability is caused by MySQL row lock contention within the locket database schema.
Key processes like the BBS use Locket to maintain leadership in an active-standby High Availability (HA) model. In EAR environments, resource constraints (such as CPU and Memory limitations at the vSphere Resource Pool level, or, storage latency) can cause MySQL queries to take longer than 15 seconds to complete.
When these queries exceed the timeout threshold:
unresponsive agent status in worst case scenarios.To resolve the service instability, you must remove the resource bottlenecks impacting the MySQL VMs.
Investigate the vSphere Resource Pools (RP) where the Tanzu Platform VMs are deployed. Ensure that there are no "Limit" configurations set for CPU or Memory that might be throttling the VMs during high-load operations like tile updates.
If the environment is under-provisioned, increase the CPU and Memory allocation for the MySQL VM. Ensure the underlying hardware can sustain the required IOPS and CPU cycles without significant latency.
This step is only required if resource starvation leads to VMs with unresponsive agent status.
Once resource limits are removed, if the VMs don't automatically return to healthy status, it might be necessary to recreate the VMs to re-establish bosh-agent connectivity to Bosh director. Use the BOSH CLI per deployment to recreate any VMs that might be reporting as unresponsive:
--fix flag:Monitor the mysql_slow_query logs and ensure that Query_time and Lock_time for the locket schema have returned to normal levels (typically sub-second).
Note similar issues in MySQL Clustered environments