I have an issue where our audit_message table filled up on the primary node and filled the /var/lib/mysql filesystem before it could replicate to the backup (failover) node.
I had to stop the mysqld service on the primary node so the gateways would failover to the backup node. But now the MYSQL Service wont start.
How do I best recover in this situation and get Mysql and Mysql Replication running?
Release : 10.1
1. Stop ssg service on all nodes:
# service ssg stop
2. Stop mysql service on primary and secondary nodes:
# service mysql stop
3. Delete binlogs on primary node
# cd /var/lib/mysql
# rm ssgbin-log.*
# Answer yes to all files but ssgbin-log.index (to avoid having to set ownership if root recreates it at next step)
# cat /dev/null > /var/lib/mysql/ssgbin-log.index
# service mysql start
4. Stop slave on primary node (was still on before outage)
# mysqladmin stop-replica
5. Truncate audit tables on primary node
# mysql ssg
SET FOREIGN_KEY_CHECKS = 0;
truncate table audit_admin;
truncate table audit_detail;
truncate table audit_detail_params;
truncate table audit_main;
truncate table audit_message;
truncate table audit_system;
SET FOREIGN_KEY_CHECKS = 1;
analyze table ssg.audit_admin;
analyze table ssg.audit_detail;
analyze table ssg.audit_detail_params;
analyze table ssg.audit_main;
analyze table ssg.audit_message;
analyze table ssg.audit_system;
6. Start mysql on secondary node (slave was already stopped before outage)
# service mysql start
# mysqladmin stop-replica
7. Reset replication
a) Stop the API Gateway 'ssg' service on Primary/Secondary AND all PROCESSING nodes: service ssg stop
b) Backup the database on the PRIMARY node first as a precaution: mysqldump --all-databases --set-gtid-purged=OFF | gzip > ~/all_databases_`date '+%Y%m%d_%T'`.sql.gz
Make sure that the last line of the newly created SQL file contains "-- Dump completed <date-time>"
Store the file for future use in case the database needs to be restored.
Consider also taking a VM snapshot if possible to most easily restore a virtual appliance.
c) Stop slave replication on both nodes: mysqladmin stop-replica
d) Reset the master configuration on both nodes: mysql -e "reset master"
c) Reset the slave configuration on both nodes: mysql -e "reset slave; reset slave all"
e) Execute the create_slave.sh script on the SECONDARY node: /opt/SecureSpan/Appliance/bin/create_slave.sh
Follow the prompts of the script, keeping the following in mind:
Provide the FQDN of the primary node
Enter yes to the prompt "Do you want to clone a database from $Master (yes or no)?"
f) On the secondary node after step 6 is completed, reset the master configuration one more time: mysql -e "reset master" <-- IMPORTANT STEP DIFFERENT FROM 9.4 PROCEDURE! DO NOT MISS!
g) Execute the create_slave.sh script on the PRIMARY node: /opt/SecureSpan/Appliance/bin/create_slave.sh
Follow the prompts of the script, keeping the following in mind:
Provide the FQDN of the secondary node
Enter no to the prompt "Do you want to clone a database from $Master (yes or no)?"
h) Start the Gateway 'ssg' service on Primary/Secondary AND all PROCESSING nodes: service ssg start
i) Query the status of the replication on both nodes: mysql -e "show slave status\G"
Verify both nodes return the following lines:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 0