Failing to rejoin the msyql nodes to the cluster after bootstrapping a node using TAS 4.0.13+LTS-T
search cancel

Failing to rejoin the msyql nodes to the cluster after bootstrapping a node using TAS 4.0.13+LTS-T

book

Article ID: 374910

calendar_today

Updated On:

Products

VMware Tanzu Application Service

Issue/Introduction

When apply changes to TAS 4.0.13+TLS-T, user might encounter the following error on recreating a mysql cluster node that's going into a failing state.

 

2024-07-29T18:55:22.623876Z 0 [ERROR] [MY-000000] [Galera] /var/vcap/data/compile/percona-xtradb-cluster-8.0/Percona-XtraDB-Cluster-8.0.33-25/percona-xtradb-cluster-galera/gcs/src/gcs.cpp:gcs_open():1876: Failed to open channel 'galera-cluster' at 'gcomm://x.x.x.x, y.y.y.y, z.z.z.z': -110 (Connection timed out)


2024-07-29T18:55:22.623894Z 0 [ERROR] [MY-000000] [Galera] gcs connect failed: Connection timed out


2024-07-29T18:55:22.623904Z 0 [ERROR] [MY-000000] [WSREP] Provider/Node (gcomm://x.x.x.x, y.y.y.y, z.z.z.z) failed to establish connection with cluster (reason: 7)


2024-07-29T18:55:22.623913Z 0 [ERROR] [MY-010119] [Server] Aborting


2024-07-29T18:55:22.627065Z 0 [System] [MY-010910] [Server] /var/vcap/data/packages/percona-xtradb-cluster-8.0/5ba0c3700d7936f2493fd87a57474d59f9736261/bin/mysqld: Shutdown complete (mysqld 8.0.33-25)  Percona XtraDB Cluster (GPL) 8.0.33-25, WSREP version 26.1.4.3.


2024-07-29T18:55:22.628373Z 0 [ERROR] [MY-010065] [Server] Failed to shutdown components infrastructure.
Task 1807143 | 18:54:38 | L executing pre-start: mysql/c74ff6bc-95f5-4470-9aac-e860e35f10e6 (0) (canary) (00:03:58)
                        L Error: Action Failed get_task: Task 8062eb5e-30b2-4456-457a-282f3f0b243f result: 1 of 7 pre-start scripts failed. Failed Jobs: pxc-mysql. Successful Jobs: loggregator_agent, sysctl, bpm, syslog_forwarder, bosh-dns, dynatrace-oneagent.
Task 1807143 | 18:55:26 | Error: Action Failed get_task: Task 8062eb5e-30b2-4456-457a-282f3f0b243f result: 1 of 7 pre-start scripts failed. Failed Jobs: pxc-mysql. Successful Jobs: loggregator_agent, sysctl, bpm, syslog_forwarder, bosh-dns, dynatrace-oneagent.

 

Running mysql-diag on the msyql monitor VM reporting two mysql nodes are in synced and healthy state, and one mysql is in a failing state.

 

When user trying to rejoin the nodes to mysql cluster after bootstrapping, it failed due to a bug in Percona MySQL version that caused datafile corruption.

Environment

Tanzu Application Service (TAS) 4.0.13+TLS-T

Cause

The root cause was a bug in Percona MySQL PXC-4343 (first identified 12/6/23 by VMware while investigating the support case issue.

 

The bug can (very infrequently) cause datafile corruption during state transfers among cluster nodes. That corruption stays undetected until some process attempts to back up the corrupted datafile with the following error:

 

2024-07-30T17:06:07.756708Z 0 [ERROR] [MY-000000] [WSREP-SST] * FATAL ERROR **

2024-07-30T17:06:07.756757Z 0 [ERROR] [MY-000000] [WSREP-SST] xtrabackup finished with error: 1. Check /var/vcap/store/pxc-mysql//innobackup.backup.log

2024-07-30T17:06:07.756773Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 2061

2024-07-30T17:06:07.758993Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (START) ------------

...

2024-07-30T17:00:28.329182-00:00 0 [ERROR] [MY-012224] [InnoDB] Header page contains inconsistent data in datafile: ./ccdb/service_usage_events.ibd, Space ID:681, Flags: 16417.

Resolution

The bug was fixed in Percona's PXC release 8.0.35.27, which went into VMware's pxc-release v1.0.23, which went into TAS release 4.0.16 .