Running 'cf push' and/or other cf cli commands (e.g., cf curl, cf apps, etc.) are failing intermittently with response code 500 and Internal Server error.
Example of the error output after running the cf command:
Unexpected Response
Response Code: 500
Request ID: xxx-yyy-zz
Code: 0, Title: , Detail: Internal server error
FAILED
The cloud_controller_ng job log file (cloud_controller_ng.log) shows these failures, which indicate that there is a problem with MySQL database connectivity.
{"timestamp":"2025-10-22T07:39:31.789616283Z","message":"Request failed: 500: {\"errors\"=\u003e[{\"title\"=\u003e\"UnknownError\", \"detail\"=\u003e\"An unknown error occurred.\", \"code\"=\u003e10001, \"test_mode_info\"=\u003e{\"detail\"=\u003e\"Mysql2::Error::ConnectionError: Lost connection to server during query\", \"title\"=\u003e\"CF-DatabaseError\", \"backtrace\"=\u003e[\"/var/vcap/data/packages/cloud_controller_ng/5ea87a46f6ed7830ec8fee5b50e44e977101639c/gem_home/ruby/3.2.0/gems/mysql2-0.5.6/lib/mysql2/client.rb:151:in `_query'\",...
{"timestamp":"2025-10-22T07:39:32.898709623Z","message":"Request failed: 500: {\"error_code\"=\u003e\"UnknownError\", \"description\"=\u003e\"An unknown error occurred.\", \"code\"=\u003e10001, \"test_mode_info\"=\u003e{\"description\"=\u003e\"MySQL client is not connected\", \"error_code\"=\u003e\"CF-Error\", \"backtrace\"=\u003e[\"/var/vcap/data/packages/cloud_controller_ng/5ea87a46f6ed7830ec8fee5b50e44e977101639c/gem_home/ruby/3.2.0/gems/sequel-5.95.1/lib/sequel/adapters/mysql2.rb:303:in `escape'\",...
{"timestamp":"2025-10-22T07:39:33.489456235Z","message":"Request failed: 500: {\"description\"=\u003e\"Database error\", \"error_code\"=\u003e\"CF-DatabaseError\", \"code\"=\u003e10011, \"test_mode_info\"=\u003e{\"description\"=\u003e\"Database error\", \"error_code\"=\u003e\"CF-DatabaseError\", \"backtrace\"=\u003e[\"/var/vcap/data/packages/cloud_controller_ng/5ea87a46f6ed7830ec8fee5b50e44e977101639c/cloud_controller_ng/app/controllers/base/base_controller.rb:88:in `rescue in dispatch'\",...
The mysql instances are all up and running. Running the mysql-diag from the mysql_proxy instances also show that the mysql servers are fine. However, there are a lot of slow queries being logged in the mysql_slow_query log file (found in the log directory of the active mysql instance). Also, the CPU utilization of the active mysql instance is very high.
VMware Tanzu Application Service
The MySQL server was getting overwhelmed with requests from various components and its CPU utilization was very high. The vm type of the MySQL instance could be too small for the current workload.
It is possible that there is a SQL query that is being run repetitively and could be due to an unintended unusual activity from a rogue process.
Review the mysql_slow_query logs. Use a tool such as "pt-query-digest" to see if there is anything that is hammering the DB.
Example usage of "pt-query-digest" inside the mysql VM:
$ apt update
$ apt install percona-toolkit
$ cd /var/vcap/sys/log/pxc-mysql
$ pt-query-digest mysql_slow_query*
# 90.3s user time, 1.9s system time, 84.33M rss, 89.70M vsz
# Current date: Thu Oct 30 17:14:36 2025
# Hostname: xxx
# Files: mysql_slow_query, mysql_slow_query.1, mysql_slow_query.2, mysql_slow_query.3, mysql_slow_query.4, mysql_slow_query.5
# Overall: 336.21k total, 30 unique, 2.11 QPS, 30.43x concurrency ________
# Time range: 2025-10-22T20:13:45 to 2025-10-24T16:31:45
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
# Exec time 4853122s 10s 73s 14s 20s 3s 13s
# Lock time 4823238s 0 42s 14s 20s 4s 13s
# Rows sent 91.93k 0 7.55k 0.28 0 42.09 0
# Rows examine 6.30M 0 324.55k 19.66 0.99 1.75k 0.99
# Rows affecte 338.83k 0 1000 1.03 0.99 6.44 0.99
# Bytes sent 232.02M 11 22.68M 723.64 51.63 111.08k 51.63
# Merge passes 7 0 1 0.00 0 0.00 0
# Tmp tables 210 0 11 0.00 0 0.06 0
# Tmp disk tbl 0 0 0 0 0 0 0
# Tmp tbl size 63.75M 0 14.38M 198.83 0 31.58k 0
# Query size 61.16M 6 47.23k 190.75 183.58 279.53 183.58
# InnoDB:
# IO r bytes 536.47M 0 115.20M 1.63k 0 325.76k 0
# IO r ops 33.53k 0 7.20k 0.10 0 20.38 0
# IO r wait 166s 0 39s 492us 0 94ms 0
# pages distin 2.98M 1 7.99k 9.28 8.91 73.19 7.70
# queue wait 0 0 0 0 0 0 0
# rec lock wai 4823932s 0 42s 14s 20s 4s 13s
# Boolean:
# Filesort 0% yes, 99% no
# Filesort on 0% yes, 99% no
# Full scan 0% yes, 99% no
# Tmp table 0% yes, 99% no
# Profile
# Rank Query ID Response time Calls R/Call V/M
# ==== =========================== ================== ====== ======= =====
# 1 0xDDEFDF2EE4728370EFB49F... 4847184.8663 99.9% 335792 14.4351 0.84 UPDATE users
# 2 0xFE718F42D2CE92BF3EA004... 2766.2305 0.1% 191 14.4829 0.93 SELECT apps
# 3 0xEE68700C08A7915B8A40E6... 899.1293 0.0% 74 12.1504 0.25 SELECT oauth_client_details
# 4 0x874081B88B3A0513F07965... 456.2969 0.0% 17 26.8410 6.91 SELECT events
# 5 0xBAD335EE7339CA94F87194... 329.4343 0.0% 27 12.2013 0.36 SELECT desired_lrps actual_lrps
# 6 0x4469751F48DF90DDDA20CE... 203.1477 0.0% 14 14.5105 0.86 SELECT UNION routes spaces route_shares spaces spaces_developers spaces_managers spaces_auditors spaces_supporters organizations_managers organizations_auditors spaces spaces_developers spaces_managers spaces_auditors spaces_supporters organizations_managers organizations_auditors
# 7 0x78EF56B6A1A3675F1FCE74... 188.8896 0.0% 15 12.5926 0.28 DELETE events
# 8 0x809B122EA77055CE4049A2... 166.4541 0.0% 14 11.8896 1.22 SELECT processes
# MISC 0xMISC 927.4485 0.0% 66 14.0522 0.0 <22 ITEMS>
# Query 1: 3.01 QPS, 43.50x concurrency, ID 0xDDEFDF2EE4728370EFB49FD1957A3DB4 at byte 9346767
# Scores: V/M = 0.84
# Time range: 2025-10-23T09:34:28 to 2025-10-24T16:31:45
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 99 335792
# Exec time 99 4847185s 10s 42s 14s 20s 3s 13s
# Lock time 99 4820439s 4s 42s 14s 20s 3s 13s
# Rows sent 0 0 0 0 0 0 0 0
# Rows examine 5 327.92k 1 1 1 1 0 1
# Rows affecte 95 324.18k 0 1 0.99 0.99 0.11 0.99
# Bytes sent 7 16.75M 52 79 52.31 51.63 2.62 51.63
# Merge passes 0 0 0 0 0 0 0 0
# Tmp tables 0 0 0 0 0 0 0 0
# Tmp disk tbl 0 0 0 0 0 0 0 0
# Tmp tbl size 0 0 0 0 0 0 0 0
# Query size 98 60.20M 188 188 188 188 0 188
# InnoDB:
# IO r bytes 0 0 0 0 0 0 0 0
# IO r ops 0 0 0 0 0 0 0 0
# IO r wait 0 0 0 0 0 0 0 0
# pages distin 86 2.59M 5 14 8.08 8.91 0.48 7.70
# queue wait 0 0 0 0 0 0 0 0
# rec lock wai 99 4821133s 4s 42s 14s 20s 3s 13s
# String:
# Databases uaa
# Hosts x.x.x.x (124402/37%)... 2 more
# InnoDB trxID 1A5407CF3 (1/0%), 1A5407DDE (1/0%)... 335790 more
# Last errno 0 (331964/98%), 1213 (3828/1%)
# Users xxxxx
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms
# 100ms
# 1s
# 10s+ ################################################################
# Tables
# SHOW TABLE STATUS FROM `uaa` LIKE 'users'\G
# SHOW CREATE TABLE `uaa`.`users`\G
update users set previous_logon_success_time = last_logon_success_time, last_logon_success_time = 1761231622043 where id = '9999xxxx' and identity_zone_id='uaa'\G
# Converted for EXPLAIN
# EXPLAIN /*!50100 PARTITIONS*/
select previous_logon_success_time = last_logon_success_time, last_logon_success_time = 1761231622043 from users where id = '9999xxxx' and identity_zone_id='uaa'\G
# Query 2: 0.00 QPS, 0.03x concurrency, ID 0xFE718F42D2CE92BF3EA004B39E621992 at byte 19690211
# Scores: V/M = 0.93
# Time range: 2025-10-23T09:35:14 to 2025-10-24T15:35:18
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 0 191
# Exec time 0 2766s 10s 27s 14s 21s 4s 13s
# Lock time 0 2758s 10s 27s 14s 21s 4s 13s
# Rows sent 0 191 1 1 1 1 0 1
# Rows examine 0 191 1 1 1 1 0 1
# Rows affecte 0 0 0 0 0 0 0 0
# Bytes sent 0 404.38k 2.12k 2.12k 2.12k 2.12k 0 2.12k
# Merge passes 0 0 0 0 0 0 0 0
# Tmp tables 0 0 0 0 0 0 0 0
# Tmp disk tbl 0 0 0 0 0 0 0 0
# Tmp tbl size 0 0 0 0 0 0 0 0
# Query size 0 11.38k 61 61 61 61 0 61
# InnoDB:
# IO r bytes 0 0 0 0 0 0 0 0
# IO r ops 0 0 0 0 0 0 0 0
# IO r wait 0 0 0 0 0 0 0 0
# pages distin 0 198 1 2 1.04 0.99 0.18 0.99
# queue wait 0 0 0 0 0 0 0 0
# rec lock wai 0 2758s 10s 27s 14s 21s 4s 13s
# String:
# Databases ccdb
# Hosts x.x.x.x (72/37%), y.y.y.y (61/31%)... 1 more
# InnoDB trxID 1A540D736 (1/0%), 1A540D771 (1/0%)... 189 more
# Last errno 0
# Users zzzzz
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms
# 100ms
# 1s
# 10s+ ################################################################
# Tables
# SHOW TABLE STATUS FROM `ccdb` LIKE 'apps'\G
# SHOW CREATE TABLE `ccdb`.`apps`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT * FROM `apps` WHERE (`id` = 167779) LIMIT 1 FOR UPDATE\G
# Query 3: 0.00 QPS, 0.01x concurrency, ID 0xEE68700C08A7915B8A40E674828759FD at byte 10860169
.....
The above example report shows that there is 1 query that is standing out compared to the other queries. In the report, it shows that there have 335,792 calls of that query (Query 1). The report contains the actual query as well:
update users set previous_logon_success_time = last_logon_success_time, last_logon_success_time = 1761231622043 where id = '9999xxxx' and identity_zone_id='uaa'\G
From the above example, it can be inferred that the query is coming from the UAA job, and the id `9999xxxx` can then be searched from the UAA logs. In this example, the id 9999xxxx is traced to a user 'TESTUSER" and the logins are coming from the IP address shown as the value of 'remoteAddress'.
$ grep 9999xxxx /var/vcap/sys/log/uaa/uaa_events*
[2025-10-30T18:07:31.839264Z] uaa - 12 [https-jsse-nio-8443-exec-1978] - [xxx] .... INFO --- Audit: IdentityProviderAuthenticationSuccess ('TESTUSER'): principal=9999xxxx, origin=[remoteAddress=10.x.y.z, clientId=cf], identityZoneId=[uaa], authenticationType=[ldap]
[2025-10-30T18:07:31.839434Z] uaa - 12 [https-jsse-nio-8443-exec-1978] - [xxx] .... INFO --- Audit: UserAuthenticationSuccess ('TESTUSER'): principal=9999xxxx, origin=[remoteAddress=10.x.y.z, clientId=cf], identityZoneId=[uaa]
[2025-10-30T18:07:31.910545Z] uaa - 12 [https-jsse-nio-8443-exec-1978] - [xxx] .... INFO --- Audit: TokenIssuedEvent ('["cloud_controller.write","openid","scim.read","uaa.user","cloud_controller.read","password.write","scim.write"]'): principal=9999xxxx, origin=[client=cf, user=TESTUSER], identityZoneId=[uaa]
...
The logins are seen to be happening multiple times per minute, which explains the volume of the SQL queries. The first command below shows the username and how many times it was logged to have successfully authenticated. The second command shows the unique remoteAddress values and the corresponding count.
$ grep 9999xxxx /var/vcap/sys/log/uaa/uaa_events* | grep UserAuthenticationSuccess | awk '{print $13}' | uniq -c
414068 ('TESTUSER'):
$ grep 9999xxxx /var/vcap/sys/log/uaa/uaa_events* | grep UserAuthenticationSuccess | awk -F= '{print $4}' | awk -F, '{print $1}' | sort | uniq -c
137977 10.x.y.z
130807 10.x.y.a
145284 10.x.y.b
$
Now, that the user account and source has been identified, then mitigation should be carried out accordingly. Once the unusual repetitive logins are stopped, then the MySQL performance (CPU usage and slow queries) can be re-assessed.
In case that it is deemed that the MySQL instance is underpowered, then the VM type can be changed to a larger one with more CPUs and more RAM.