A known bug in older versions of high-availability (PXC) clusters can corrupt a data file that corresponds to an arbitrary table in the database. This document outlines how to identify this infrequent but serious bug.
The bug is arbitrarily timing-dependent; it corrupts a data file when database startup operations happen in a specific triggering sequence. During startup, the database loads individual data files corresponding to individual tables, then closes each file.
In the triggered startup case, a stray write corrupts a table file before that file's closed -- but after that file has been successfully read into memory.
In the triggered startup case, the database reads a table file into memory, but then corrupts that file on disk during file close. This post-read file corruption means:
The bug later surfaces when mysql tries to read the corrupted file. These read operations include:
the node "donating" its state during galera state transfer (which runs xtrabackup under the hood)
AFFECTED VERSIONS:
TAS v6.0.x is not vulnerable to this issue.
Tiles 2.10.x, 3.2.x and 3.3.x are not vulnerable to this issue.
All scenarios surface the error:
Header page contains inconsistent data in datafile: <filename>
The filename after "inconsistent data in datafile:" identifies the database and table corrupted.
To proactively check whether a running HA cluster is impacted by this issue:
For a running node: run "xtrabackup". (We discard the actual output of the backup.)
bosh -d DEPLOYMENT_NAME ssh mysql/0 -c "sudo /var/vcap/packages/percona-xtrabackup-8.0/bin/xtrabackup --defaults-file=/var/vcap/jobs/pxc-mysql/config/mylogin.cnf --backup --stream=xbstream > /dev/null"
On production systems, run the above command on each node individually -- "mysql/0" then "mysql/1" and "mysql/2". Running xtrabackup simultaneously on all 3 nodes could impair cluster performance.
For a stopped node: run /var/vcap/jobs/pxc-mysql/bin/get-sequence-number .
bosh -d DEPLOYMENT_NAME ssh mysql/0 -c "sudo /var/vcap/jobs/pxc-mysql/bin/get-sequence-number"
In both cases the command should succeed, or else fail with the "Header page contains inconsistent data" error message.
The MySQL team has only ever seen one corrupted node at a time -- and at most two files on that node.
When encountering this "Header page contains inconsistent data" error you should immediately open a ticket with support to assist in recovery.