MySQL database on PKS VM fails to start after a storage outage
search cancel

MySQL database on PKS VM fails to start after a storage outage

book

Article ID: 298728

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

After a storage failure, any prepared transactions left after recovery galera-init process on Pivotal Container Service (PKS) VM goes into an Execution failed state. This can be confirmed by looking at the monit summary output:
monit summary

The Monit daemon 5.2.5 uptime: 54m 

Process 'pks-api'          Does not exist
Process 'broker'          running
Process 'pks-nsx-t-osb-proxy'    running
Process 'galera-init'        Execution failed
Process 'cluster-health-logger'   not monitored
Process 'galera-agent'       running
Process 'gra-log-purger'      running
Process 'uaa'            running
Process 'blackbox'         running
Process 'vrli-fluentd'       running
Process 'telemetry-server'     initializing
Process 'bosh-dns'         running
Process 'bosh-dns-resolvconf'    running
Process 'bosh-dns-healthcheck'   running
System 'system_localhost'      running
To narrow down the cause of this issue check /var/vcap/sys/log//pxc-mysql/mysql.err.log on the PKS VM. In this particular scenario, the error occurred because the XA prepared transaction survived after the crash. This can be confirmed from the following log entry:
2019-09-21T01:10:43.338079Z 0 [Note] InnoDB: Buffer pool(s) load completed at 190921 1:10:43
2019-09-21T01:10:43.341958Z 0 [Note] InnoDB: Starting recovery for XA transactions...
2019-09-21T01:10:43.341973Z 0 [Note] InnoDB: Transaction 1096084 in prepared state after recovery
2019-09-21T01:10:43.341978Z 0 [Note] InnoDB: Transaction contains changes to 95 rows
2019-09-21T01:10:43.341983Z 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2019-09-21T01:10:43.341986Z 0 [Note] Found 1 prepared transaction(s) in InnoDB
2019-09-21T01:10:43.342002Z 0 [ERROR] Found 1 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions.
2019-09-21T01:10:43.342008Z 0 [ERROR] Aborting


Environment

Product Version: 1.4
OS: Ubuntu

Resolution

Execute the following steps to recover from this failure and bring back the MySQL database:
1. SSH to the PKS VM using: bosh ssh

2. Stop all the processes on the PKS VM. This can be done using the command: monit stop all

3. Start the MySQL process using the following command:
/var/vcap/packages/pxc/bin/mysqld --defaults-file=/var/vcap/jobs/pxc-mysql/config/my.cnf --wsrep-new-cluster --tc-heuristic-recover=COMMIT &
4. Once the MySQL database starts successfully, connect to the database and run a test query:
/var/vcap/packages/pxc/bin/mysql --defaults-file=/var/vcap/jobs/pxc-mysql/config/mylogin.cnf

mysql> use pks;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed

mysql> select count(*) from cluster;
+----------+
| count(*) |
+----------+
|        4 |
+----------+
1 row in set (0.00 sec)
5. Exit out of the MySQL prompt and kill the mysqld process. Verify that the process is gone after executing the kill command.
ps -ef | grep mysql
vcap      8659  8451  1 17:52 pts/0    00:00:02 /var/vcap/packages/pxc/bin/mysqld --defaults-file=/var/vcap/jobs/pxc-mysql/config/my.cnf --wsrep-new-cluster --tc-heuristic-recover=COMMIT

kill -11 8659
6. Start all of the processes with the command: monit start all