How to manually force a MySQL node to rejoin the HA cluster

search cancel

How to manually force a MySQL node to rejoin the HA cluster

book

Article ID: 297540

calendar_today

Updated On: 05-03-2024

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

When a node cannot rejoin the HA (Percona XtraDB Cluster - PXC) cluster automatically, an operator needs to manually force a MySQL node to rejoin.

This article describes the case where a single node is joined to a cluster. For bootstrapping, please see the TAS or MySQL tile documentation.

Cause

If your HA cluster is experiencing downtime or is in a degraded state, VMware recommends first running the "mysql-diag" tool to gather information regarding the current state of the cluster. This tool will either report a healthy cluster with (typically) 3 running nodes, a cluster in quorum with two running nodes and a third node needing to re-join, or a cluster that has lost quorum and requires a bootstrap. The "mysql-diag" tool is available on the mysql_monitor instance for a Tanzu Application Service for VMs (TAS) internal cluster or on the mysql_jumpbox instance for a MySQL tile HA service instance

Resolution

This procedure removes all the data from a server node, forces it to join the cluster, receiving a current copy of the data from one of the other nodes already in the cluster. The steps are slightly different based on which MySQL cluster this is for.

Notes:
- Do not do this if there is any data on the local node that you need to preserve.
- The other two nodes must be online and healthy. You can validate this by looking at the "mysql-diag" output or checking the MySQL Proxy logs (i.e., "grep 'Healthcheck failed on backend' proxy.combined.log"). "mysql-diag" reports a healthy node as "Synced" and "Primary".

For a TAS MySQL cluster or a MySQL tile HAS cluster:

Log into the instance as root.
Run "monit stop galera-init". Skip to step 3 if the monit job is unavailable.
Ensure mysql is stopped by running "ps auxw | grep mysqld". Kill the mysqld process(es) if running.
Run "mv /var/vcap/store/pxc-mysql /var/vcap/store/pxc-mysql-backup" (or "rm -rf /var/vcap/store/pxc-mysql" if disk space is a concern). Cleanup the backup after successfully joining the node to the cluster.
Run "/var/vcap/jobs/pxc-mysql/bin/pre-start". An error code of 0 indicates success.
Restart the database on the instance

To restart the database on the instance, either:

Run "monit start galera-init" if the monit job is available, or
"bosh -d deploymentName restart mysql/instanceGUID --no-converge" if the instance has no 'galera-init' monit job.

Feedback

Was this article helpful?

thumb_up Yes

thumb_down No