MySQL monitor fails to update and complains the replication canary job failed to come up
Error 400007: 'mysql_monitor/0 (6363eb9b-8351-4168-88fe-0c50ca7c3872)' is not running after update. Review logs for failed jobs: replication-canary
Replication canary logs show it could not reach MySQL proxy/0
{"timestamp":"1491532826.912001133","source":"/var/vcap/packages/replication-canary/bin/replication-canary","message":"/var/vcap/packages/replication-canary/bin/replication-canary.Making request to proxy","log_level":0,"data":{"method":"GET","url":{"Scheme":"https","Opaque":"","User":null,"Host":"proxy-0-p-mysql-ert.cfhdctest.kroger.com","Path":"/v0/backends","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":""}}} {"timestamp":"1491532826.932095051","source":"/var/vcap/packages/replication-canary/bin/replication-canary","message":"/var/vcap/packages/replication-canary/bin/replication-canary.received bad status code from proxy","log_level":0,"data":{"statusCode":502}} {"timestamp":"1491532826.932335138","source":"/var/vcap/packages/replication-canary/bin/replication-canary","message":"/var/vcap/packages/replication-canary/bin/replication-canary.Canary setup failed","log_level":3,"data":{"error":"bad response (502) - 502 Bad Gateway: Registered endpoint failed to handle the request.\n","trace":"goroutine 1 [running]:\ngithub.com/pivotal-cf-experimental/replication-canary/vendor/code.cloudfoundry.org/lager.(*logger).Fatal(0xc4200522a0, 0x7333e4, 0x13, 0x880c20, 0xc42025b9a0, 0x0, 0x0, 0x0)\n\t/var/vcap/packages/replication-canary/src/github.com/pivotal-cf-experimental/replication-canary/vendor/code.cloudfoundry.org/lager/logger.go:131 +0xc7\nmain.main()\n\t/var/vcap/packages/replication-canary/src/github.com/pivotal-cf-experimental/replication-canary/main.go:149 +0x1250\n"}} panic: bad response (502) - 502 Bad Gateway: Registered endpoint failed to handle the request
MySQL proxy 0 is stuck waiting for a lock
{switchboard.lock.acquiring-lock","log_level":1,"data":{"key":"v1/locks/mysql_lock","session":"1","value":""}
Applications using MySQL might see connection error 111 (Connection Refused):
Error: Can't connect to MySQL server on '192.168.20.111' (111)
MySQL and the proxies are working as intended. There are multiple proxies, but only one is ever active at a time, and they switch who is the active one using a consul lock.
Currently replication canary will only try to reach MySQL proxy/0 regardless of how many proxy instances are deployed. If proxy/0 is down or unreachable then MySQL monitor job will fail with the above symptoms.
In this case, BOSH is reporting the proxy instances are running even though proxy/0 is not listening on port 8080. The reason is there is a known problem with having 3 or more proxy instances deployed. They will all compete for a lock through the consul which can result in a deadlock.
The following will be added in a future release
The best option here is to deploy a load balancer in front of the proxies using the "MySQL Service Hostname" config option. If this option is not used, then the system should instead be configured only to use a single proxy instance. See workaround for options on how to get out of this situation without a load balancer.
If configuring a load balancer is not an immediate option then a quick resolution would be to scale down to 1 proxy instance, or monit stop all proxy instances except for one effectively reducing the number of proxies to 1 instance. Once SSH is done into the proxy server you can run the below command to stop all services. Then later choose to scale the instance count to 1 or configure a load balancer.
monit stop all