Large Audit Logs for internal MySQL causes Tanzu Application Service for VMs platform issues during log rotation
search cancel

Large Audit Logs for internal MySQL causes Tanzu Application Service for VMs platform issues during log rotation

book

Article ID: 298216

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

Important: This article only applies if your foundation has internal MySQL configured and Server activity logging is enabled in Tanzu Application Service for VMs (TAS for VMs).


Symptoms

  • Autoscaler is not working.
  • API calls are taking a long time.
  • Cloud Controllers (CCs) are intermittently unavailable and have a high load.
  • Diego Database is showing errors and the Bulleting Board System (BBS) and NSX Container Plug-in (NCP) (if NSX-T installed) jobs are restarting.
  • MySQL is running with a high load and there are a lot of connection errors with other platform components in the logs.
  • You might also see high I/O activity in the infrastructure layer for the MySQL cluster roughly every 15 mins.

Error Samples

2021-06-09T08:44:14.190113Z 36953480 [Note] Aborted connection 36953480 to db: 'locket' user: 'VOKaMusfrTsYUfIrWU' host: '10.157.6.65' (Got an error reading communication packets)
2021-06-09T08:44:14.345601Z 36990922 [Note] Got an error reading communication packets
2021-06-09T08:44:14.420929Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 22653ms. The settings might not be optimal. (flushed=201, during the time.)
and 
2021/06/09 11:13:59 http: TLS handshake error from 10.157.6.108:54460: read tcp 10.157.6.40:8889->10.157.6.108:54460: use of closed network connection
2021/06/09 12:02:08 http: TLS handshake error from 10.157.6.116:40778: EOF
[mysql] 2021/06/09 12:44:04 connection.go:96: invalid connection

Root Cause

The Server activity logging feature uses the Audit Log Plugin to track connection and query activity on the MySQL server. For more information, refer to Audit Log Plugin

The Audit Logs are located in the MySQL VM via the path, /var/vcap/store/mysql_audit_logs/. There is a cronjob which rotates the logs every 15 minutes if the file has reached the defined size (default 100MB).

During the log rotation of large files due to high activity, primarily logging QUERY type events, the MySQL VM could experience spikes in CPU, memory, and disk activity causing it to become overloaded. This creates a domino effect which affects many components talking to the VM, including BBS, NCP, and Cloud Controller. This causes issues with autoscaling, staging, etc.

Environment

Product Version: 2.10

Resolution

Workaround

At this time, the only work around is to disable Server activity logging completely. You can do this by navigating to the TAS tile > Internal MySQLServer activity logging.

Note: An alternate workaround is to only record CONNECT type events and filter out QUERY events. This would prevent the log file from growing quickly. However, a different bug was found where the text box in the MySQL settings to filter by event type is currently not working.

The MySQL team is working on improvements to the Audit Logs rotation, as well as fixing the Event Types filter issue.