Datacenter powered-off or "bosh stop --hard" command causes Tanzu Application Service for VMs internal MySQL cluster to fail
search cancel

Datacenter powered-off or "bosh stop --hard" command causes Tanzu Application Service for VMs internal MySQL cluster to fail

book

Article ID: 298099

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

The following are possible situations in which Tanzu Application Service for VMs (TAS for VMS) may be affected:
  • Your datacenter may be powered-off / restarted
  • An Operator runs the command "bosh stop --hard
When you check the VM status from the Ops Manager UI page, you see the following:

Screen Shot 2021-05-31 at 10.43.00 AM.jpg

Most of the instances are in failing state because the MySQL cluster is not up and running. This is because TAS for VMs internal MySQL servers is a High Availability (HA) cluster, and certain operation lead to cluster loses quorum.

Environment

Product Version: 2.9

Resolution

To bring up the MySQL cluster, follow the steps below to identify the MySQL instance with highest seqno and run a manual bootstrap. 

IMPORTANT: You cannot bootstrap the cluster unless you have shut down the mysqld process on all nodes in the cluster. For information on how to stop the galera-init process for each node in the cluster, refer to Shut Down MySQL.

1. Identify which node has the highest sequence number:
bosh -d <cf-deployment-name> ssh database -c "sudo cat /var/vcap/store/pxc-mysql/grastate.dat | grep 'seqno:'"

2. If all the seqno are positive, ssh to the node with the highest seqno:
bosh -d <cf-deployment-name> ssh mysql/<database-with-highest-seq-no-id>

Then run this command:
echo -n "NEEDS_BOOTSTRAP" > /var/vcap/store/pxc-mysql/state.txt

If some of the the seqnos aren't positive, then try following the instructions covered in Determine which Node to Bootstrap to recover the seqno.

3. Run the BOSH Ignore command for nodes with a lower seqno (you'll have to do this to 2 nodes).
bosh -d <cf-deployment-name> ignore mysql/<instance id>

4. Run the BOSH Start command on the remaining node with the highest seqno. 
bosh -d <cf-deployment-name> start mysql/<instance id>
Note: this should create a 1 node cluster in a good state

5. Run the BOSH Unignore command on the 2 instances you ignored previously.
bosh -d <cf-deployment-name> unignore mysql/<instance id>

6. Run the BOSH Start command to start the CF deployment
bosh -d <cf-deployment-name> start

Once the start finishes, TAS for VMs DB should be in a good state. 

During the process of starting your CF deployment, if there are other VMs that still hit this issue, check if it can be resolved by recreating the VMs.

Summary

grastate.dat with a sequence number of -1 is fine on a running node. When the cluster stops cleanly, the sequence number will be set to a value on each node. You'll then have to bootstrap the cluster from the node which has the most advanced (highest) seqno. For more information, refer to When to Bootstrap.