This article applies to all VMware Tanzu Application Service (TAS) for VMs (
Regular and
Small Footprint) deployments.
Versions Impacted: TAS 2.3.x, TAS 2.4.x, TAS 2.5.0 - 2.5.12, TAS 2.6.0 - 2.6.7, TAS 2.7.0 - 2.7.1
App staging fails with a
503 error:
Server error, status code: 503, error code: 170015, message: Runner is unavailable: Process Guid: <App-Instance-GUID>: Connection refused - Connection refused - connect(2) for "bbs.service.cf.internal" port 8889 (bbs.service.cf.internal:8889)
In
bbs.stdout.log, we see the following error:
{"timestamp":"2019-10-01T15:56:36.524833203Z","level":"error","source":"bbs","message":"bbs.locket-lock.lost-lock","data":{"duration":15000212109,"error":"rpc error: code = 4 desc = context deadline exceeded","lock":{"key":"bbs","owner":"bb26bd04-0ae6-45bf-ad97-42751433fee4","type":"lock","type_code":1},"request-uuid":"c3cf1ac7-cb96-4c6c-465e-ba332ff71398","session":"4","ttl_in_seconds":15}}
After checking the MySQL for VMware Tanzu
slow_query logs, we found a lot of aborted connections on locket and silk.
In the MySQL logs, we see the following error:
InnoDB: Difficult to find free blocks in the buffer pool (1587 search iterations)! 0 failed attempts to flush a page! Consider increasing the buffer pool size. It is also possible that in your Unix version fsync is very slow, or completely frozen inside the OS kernel. Then upgrading to a newer version of your operating system may help. Look at the number of fsyncs in diagnostic info below. Pending flushes (fsync) log: 1; buffer pool: 0. 1154586110 OS file reads, 328304199 OS file writes, 30369677 OS fsyncs. Starting InnoDB Monitor to print further diagnostics to the standard output.
Cause
The TAS tile hard-codes an unreasonably small buffer pool of
2GB for its internal MySQL hosts, especially for a larger platform.