You may see the following error message:
Stdout 2020/02/25 17:03:10 Started executing command: make-leader 2020/02/25 17:03:10 Requesting instance to be configured as leader 2020/02/25 17:03:10 Failed to promote leader: make-leader request failed: [500 Internal Server Error] Replication settings exist on this instance and Slave SQL Thread is turned off. Fix replication settings and try again.
In some situations, for example scaling to a larger database, the follower VM is always far behind the leader VM.
Run the command "cf nozzle -no-filter | grep "origin:\”p.mysql\”” | grep seconds", and observe the following:
origin:"p.mysql" eventType:ValueMetric timestamp:1583179694657263273 deployment:"service-instance_#############" job:"mysql" index:"1f8491bb-b4e7-4654-aa21-c52c0a247d87" ip:"10.233.8.47" tags:<key:"source_id" value:"##################" > valueMetric:<name:"/p.mysql/follower/seconds_since_leader_heartbeat" value:173162 unit:"integer" >
Product Version: 2.6
From the value of "seconds_since_leader_heartbeat" you can see the follower is far behind the leader, this appears to be a lagging follower which caused by network issue.
One workaround in the short term would be to relax durability on the follower.
This can be done by running the following two queries on a follower:
mysql> SET GLOBAL sync_binlog = off; mysql> SET GLOBAL innodb_flush_log_at_trx_commit = 2;
This should improve the throughput on the follower and you would expect the “seconds_behind_leader” metric to either level off or start decreasing.