TAS upgrade from 4.0.26 to 6.0.11 failed at mysql with pre-stop scripts

Products

VMware Tanzu Application Service for VMs VMware Tanzu Application Service VMware Tanzu Application Service

Issue/Introduction

The customer is in the middle of a production upgrade from TAS 4.0.26 to TAS 6.0.11, AC failed on MySQL pre-stop scripts. Deployment failed details as blow -

Task ### | 17:49:40 | Updating instance mysql: mysql/0 (1) (canary)
Task ### | 17:49:40 | L executing pre-stop: mysql/0 (1) (canary)
Task ### | 17:49:47 | L executing drain: mysql/0 (1) (canary)
Task ### | 17:50:05 | L stopping jobs: mysql/0 (1) (canary)
Task ### | 17:50:21 | L executing post-stop: mysql/0 (1) (canary)
Task ### | 17:52:47 | L installing packages: mysql/0 (1) (canary)
Task ### | 17:52:59 | L configuring jobs: mysql/0 (1) (canary)
Task ### | 17:52:59 | L executing pre-start: mysql/0 (1) (canary)
Task ### | 17:54:59 | L starting jobs: mysql/0(1) (canary)
Task ### | 17:55:30 | L executing post-start: mysql/0 (1) (canary) (00:05:51)
Task ### | 17:55:31 | Updating instance mysql: mysql/1 (2)
Task ### | 17:55:31 | L executing pre-stop: mysql/1 (2) (00:05:33)
                       L Error: Action Failed get_task: Task ### result: 1 of 1 pre-stop scripts failed. Failed Jobs: pxc-mysql.
Task ### | 18:01:04 | Error: Action Failed get_task: Task ### result: 1 of 1 pre-stop scripts failed. Failed Jobs: pxc-mysql.
Task ### Started  Thu Feb 13 17:41:21 UTC 2025
Task ### Finished Thu Feb 13 18:01:04 UTC 2025
Updating deployment:
 Expected task '###' to succeed but state is 'error'
Exit code 1

Further checking the mysql.err.log of the issued node, you will see below error message -

2025-02-13T18:01:43.035002Z 0 [ERROR] [MY-000000] [Galera] /var/vcap/data/compile/percona-xtradb-cluster-8.0/Percona-XtraDB-Cluster-8.0.36-28/percona-xtradb-cluster-galera/gcs/src/gcs_group.cpp:group_check_proto_ver():341: Group requested gcs_proto_ver: 4, max supported by this node: 2.Upgrade the node before joining this group.Need to abort.

Cause

Checking from the release note Component / Version table to figure out the pxc release version and its dependency Percona Xtradb Cluster version -

TAS 4.0.26 => PXC 1.0.30 => Percona-XtraDB-Cluster-8.0.36-28
TAS 6.0.11 => PXC 1.0.33 => Percona-XtraDB-Cluster-8.0.39-30

This message "Group requested gcs_proto_ver: 4, max supported by this node: 2." is surfaced due to a recent GCS (Group Communication system) protocol bump in Percona XtraDB Cluster v8.0.39, where the maximum protocol was bump from 2 to 4 as a result of a few bug fixes.

An existing v8.0.36 cluster runs at protocol version "2". Upgrading to v8.0.39 initially runs at protocol version "2", but once all of the v8.0.36 nodes leave the cluster the protocol version is bumped to "4". At that point any attempt by an v8.0.36 (or earlier) node to join the cluster will fail on the error we observed in the logs.

Resolution

Checking the mysql-diag, we can see two mysql instances were in Synced and Primary status.

+----------------+-------------+----------------+-----------------------+----------------------+
| INSTANCE       |    STATE    | CLUSTER STATUS | PERSISTENT DISK USED  | EPHEMERAL DISK USED  |
+----------------+-------------+----------------+-----------------------+----------------------+
| mysql/0        | Synced      | Primary        | 42.3G / 98.2G (43.1%) | 2.3G / 218.3G (1.1%) |
| mysql/1        | N/A - ERROR | N/A - ERROR    | 42.8G / 98.2G (43.5%) | 2.9G / 218.3G (1.3%) |
| mysql/2        | Synced      | Primary        | 42.6G / 98.2G (43.4%) | 2.3G / 218.3G (1.1%) |
+----------------+-------------+----------------+-----------------------+----------------------+

Deleting the VM's that did not upgrade and Apply Change from the Ops Manager UI should work fine.