MySQL and Cloud_Controller experienced downtime due to the MySQL cluster going out of sync.
In the mysql.err.log, all 3 nodes received an external shutdown signal within seconds of each other.
2025-07-31T08:38:07.709144Z 0 [System] Received SHUTDOWN from user <via user signal>. Shutting down mysqld...2025-07-31T08:38:08.685552Z 13 [Note] Non-primary view...2025-07-31T08:38:08.685341Z 0 [Note] Shifting SYNCED -> CLOSED (TO: 257686700)
This caused the cluster to lose quorum and prevent an automatic restart. The nodes were unable to restart automatically because none of them could establish who the active leader was. Manual intervention in the form of a bootstrap is the correct course of action.
2025-07-31T08:39:53.618576Z 0 [ERROR] [Galera] failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout)2025-07-31T08:39:54.619359Z 0 [ERROR] [Galera] Failed to open channel 'galera-cluster' at 'gcomm://###.###.##.##,###.###.##.##,###.###.##.##': -110 (Connection timed out)2025-07-31T08:39:54.619387Z 0 [ERROR] [WSREP] Provider/Node failed to establish connection with cluster (reason: 7)2025-07-31T08:39:54.619395Z 0 [ERROR] [Server] Aborting
The shutdown is most likely triggered by a task that restarted all the nodes simultaneously instead of sequentially. Check for deployments, tasks, or concourse automation jobs during the time frame that may have caused this.
Bootstrap the MySQL VMs following the documentation below. Once the bootstrap is complete, the other failed VMs will recover automatically.