This article will discuss some common issues seen when replication for service metrics is enabled, and the recommendations from Broadcom.
Common issues seen:
Error 'Duplicate entry '20c25295fdfc4e81bcf84945596e4543-\x00\x00\x00\x00\x00\x00\x00\x0' for key 'nodeid'' on query. Default database: 'ssg'. Query: 'insert into service_metrics (nodeid, end_time, interval_size, back_max, front_max, back_min, front_min, attempted, authorized, completed, period_start, resolution, published_service_goid, service_state, start_time, back_sum, front_sum, goid) values ('20c25295fdfc4e81bcf84945596e4543', 1573200430000, 5000, null, null, null, null, 0, 0, 0, 1573200425000, 0, x'0000000000000000FFFFFFFFFFFFFFFF', null, 1573200425000, 0, 0, x'D201C5A7A9D6517C4C31A18990B76F22')'
Error 'Cannot add or update a child row: a foreign key constraint fails (`ssg`.`service_metrics_details`, CONSTRAINT `service_metrics_details_ibfk_1` FOREIGN KEY (`mapping_values_goid`) REFERENCES `message_context_mapping_values` (`goid`) ON DELETE NO ACTION ON UPD)' on query. Default database: 'ssg'. Query: 'insert into service_metrics_details (back_max, front_max, back_min, front_min, attempted, authorized, completed, back_sum, front_sum, mapping_values_goid, service_metrics_goid) values (27, 29, 27, 29, 1, 1, 1, 27, 29, x'BD0182D9CE24D1A3AE79E38B9ADF2B95', x'BD0182D9CE24D1A3AE79E38B9ADF62C6')'
This article applies to all supported API Gateway versions.
There are many root causes for the various different replication errors, but the common theme is replication of service metrics, creating additional overhead and making a storm for new issues.
Broadcom recommends disabling the replication of the service_metrics and service_metrics_details and audit tables in MySQL in production systems. The solution below will exclude the service metrics-related tables from MySQL replication.
1. Edit the /etc/my.cnf MySQL configuration file
2. Add the following three lines to ignore service_metrics from being replicated:
replicate-ignore-table=ssg.service_metrics
replicate-ignore-table=ssg.service_metrics_details
replicate-ignore-table=ssg.service_usage
3. Add the following six lines to ignore Audit records from being replicated:
replicate-ignore-table=ssg.audit_admin
replicate-ignore-table=ssg.audit_main
replicate-ignore-table=ssg.audit_detail
replicate-ignore-table=ssg.audit_detail_params
replicate-ignore-table=ssg.audit_message
replicate-ignore-table=ssg.audit_system
4. If needed, add the following line to ignore counters from being replicated:
replicate-ignore-table=ssg.counters
These changes should be added under the "Slave reliability items" section in the /etc/my.cnf file.
Here is an example:
# Slave reliability items:
slave-skip-errors=126,1053,1105,1129,1158,1159,1160,1161
slave-net-timeout=30
slave_exec_mode=IDEMPOTENT
#Add replication ignores here
replicate-ignore-table=ssg.service_metrics
replicate-ignore-table=ssg.service_metrics_details
replicate-ignore-table=ssg.service_usage
replicate-ignore-table=ssg.audit_admin
replicate-ignore-table=ssg.audit_main
replicate-ignore-table=ssg.audit_detail
replicate-ignore-table=ssg.audit_detail_params
replicate-ignore-table=ssg.audit_message
replicate-ignore-table=ssg.audit_system
4. While editing the /etc/my.cnf MySQL configuration file, edit the existing line below by changing it from 1 to 0 to read as follows:
innodb_flush_log_at_trx_commit=0
5. Restart the mysqld service:
service mysqld restart
The above steps should be completed on each database node in the cluster.
If replication broke after making these changes and after restarting the mysqld service, restart replication with the restart_replication.sh script.