Need help restoring replication in gateway cluster - /var/lib/mysql is full on primary node
search cancel

Need help restoring replication in gateway cluster - /var/lib/mysql is full on primary node

book

Article ID: 267327

calendar_today

Updated On:

Products

CA API Gateway

Issue/Introduction

I have an issue where our audit_message table filled up on the primary node and filled the /var/lib/mysql filesystem before it could replicate to the backup (failover) node.

I had to stop the mysqld service on the primary node so the gateways would failover to the backup node. But now the MYSQL Service wont start.

How do I best recover in this situation and get Mysql and Mysql Replication running?

Environment

Release : 10.1

Resolution

1. Stop ssg service on all nodes:

# service ssg stop

2. Stop mysql service on primary and secondary nodes:

# service mysql stop

3. Delete binlogs on primary node

# cd /var/lib/mysql
# rm ssgbin-log.*
# Answer yes to all files but ssgbin-log.index (to avoid having to set ownership if root recreates it at next step)
# cat /dev/null > /var/lib/mysql/ssgbin-log.index
# service mysql start

4. Stop slave on primary node (was still on before outage)

# mysqladmin stop-replica

5. Truncate audit tables on primary node

# mysql ssg

SET FOREIGN_KEY_CHECKS = 0;
truncate table audit_admin;
truncate table audit_detail;
truncate table audit_detail_params;
truncate table audit_main;
truncate table audit_message;
truncate table audit_system;
SET FOREIGN_KEY_CHECKS = 1;

analyze table ssg.audit_admin;
analyze table ssg.audit_detail;
analyze table ssg.audit_detail_params;
analyze table ssg.audit_main;
analyze table ssg.audit_message;
analyze table ssg.audit_system;

6. Start mysql on secondary node (slave was already stopped before outage)

# service mysql start
# mysqladmin stop-replica

7. Reset replication

a) Stop the API Gateway 'ssg' service on Primary/Secondary AND all PROCESSING nodes: service ssg stop

b) Backup the database on the PRIMARY node first as a precaution: mysqldump --all-databases --set-gtid-purged=OFF | gzip > ~/all_databases_`date '+%Y%m%d_%T'`.sql.gz
   Make sure that the last line of the newly created SQL file contains "-- Dump completed <date-time>"
   Store the file for future use in case the database needs to be restored.
   Consider also taking a VM snapshot if possible to most easily restore a virtual appliance.

c) Stop slave replication on both nodes: mysqladmin stop-replica

d) Reset the master configuration on both nodes: mysql -e "reset master"

c) Reset the slave configuration on both nodes: mysql -e "reset slave; reset slave all"

e) Execute the create_slave.sh script on the SECONDARY node: /opt/SecureSpan/Appliance/bin/create_slave.sh

   Follow the prompts of the script, keeping the following in mind:
   Provide the FQDN of the primary node 
   Enter yes to the prompt "Do you want to clone a database from $Master (yes or no)?"
   
f) On the secondary node after step 6 is completed, reset the master configuration one more time: mysql -e "reset master" <-- IMPORTANT STEP DIFFERENT FROM 9.4 PROCEDURE! DO NOT MISS!

g) Execute the create_slave.sh script on the PRIMARY node: /opt/SecureSpan/Appliance/bin/create_slave.sh

   Follow the prompts of the script, keeping the following in mind:
   Provide the FQDN of the secondary node 
   Enter no to the prompt "Do you want to clone a database from $Master (yes or no)?"

h) Start the Gateway 'ssg' service on Primary/Secondary AND all PROCESSING nodes: service ssg start

i) Query the status of the replication on both nodes: mysql -e "show slave status\G"

   Verify both nodes return the following lines:
   Slave_IO_Running: Yes
   Slave_SQL_Running: Yes
   Seconds_Behind_Master: 0