Race condition in mysql due to parallel applier threads

search cancel

Race condition in mysql due to parallel applier threads

book

Article ID: 374821

calendar_today

Updated On: 09-04-2024

Products

VMware Tanzu Application Service

Issue/Introduction

In some versions of TAS, the mysql cluster runs parallel applier threads. The applier threads can get into a race condition when the server node joins the cluster. The node will fail to join, and will get into a bad state.

Environment

The condition was introduced in pxc/1.0.12, which was bundled with TAS 4.0.5. The workaround that sets wsrep_applier_threads to 1 is included with TAS 4.0.21+ and 6.0.1+. The underlying bug that caused the deadlock with values > 1 was fixed in pxc/1.0.29 which shipped with TAS v4.0.25

Resolution

Preferred workaround

Edit the appropriate config file on the Ops Manager VM, then applying changes to the TAS tile.

Find the config file by running this command with root privilege on the Ops Man VM:

sudo find /var/tempest/workspaces/default/metadata -exec grep -l "^name: cf" {} \;

FInd these lines

      engine_config:
        galera:
          enabled: true

Add this line directly under "enabled: true", aligned with the first "e" in enabled. Correct alignment is essential to YAML syntax; otherwise you will see a 500 server error in the Ops Man UI.

          wsrep_applier_threads: 1

Here are automated steps

$ ssh $opsmgr_vm # <- e.g. "smith ssh" in a Shepherd environment
$ sudo -i
# cf_metadata="$(grep -lr '^name: cf' /var/tempest/workspaces/default/metadata/)"
# cp "$cf_metadata" $HOME/original_cf_metadata.yml # <- just a backup to be safe
# cp "$cf_metadata" /tmp/cf.yml
# sed -i'' -r -e '/innodb_buffer_pool_size_percent:/i\ wsrep_applier_threads: 1' /tmp/cf.yml
# diff -u "$cf_metadata" /tmp/cf.yml
--- /var/tempest/workspaces/default/metadata/cbd28b16e356.yml 2024-04-08 11:55:30.312043459 +0000
+++ /tmp/cf.yml 2024-04-08 15:12:51.448307401 +0000
@@ -8799,6 +8799,7 @@
engine_config:
galera:
enabled: true
+ wsrep_applier_threads: 1
innodb_buffer_pool_size_percent: 50
innodb_flush_log_at_trx_commit: 2
innodb_strict_mode: true

### If the change look correct and similar to the above output, replace the metadata

# mv /tmp/cf.yml "$cf_metadata"

Inspect formatting of file. Make sure the wsrep_applier_threads line starts directly under "enabled: true", aligned with the first "e" in enabled. Correct alignment is essential to YAML syntax; otherwise you will see a 500 server error in the Ops Man UI.

### Make sure the file ownership is correct:

chown tempest-web:tempest-web $cf_metadata

Once this change is saved and you run Apply Changes to the TAS tile, the fix is in place. It’s also possible to edit the my.cnf file (mysql configuration) on each mysql server node; this would avoid the need to apply changes to the whole TAS deployment.

Faster but less persistent workaround

ssh $opsmgrvm
echo -e "[mysqld]\nwsrep_applier_threads = 1" >> /var/vcap/jobs/pxc-mysql/config/my.cnf

Executing monit restart galera-init afterwards will ensure the change is applied successfully, otherwise it will just be picked up on the next restart.

This method is less persistent than the preferred workaround; it will not be retained after a stemcell upgrade, or after any bosh recreate operations.

Feedback

Was this article helpful?

thumb_up Yes

thumb_down No