During an operation that triggers a container restart (crashing, rolling app deployments/restarts, etc) some requests may fail with an HTTP 503 certificate error:
Example log message:x_cf_routererror:"endpoint_failure (tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match <GUID>)"
This is due to the current gorouter retry logic for apps with 1 or 2 instances.
Currently if a request fails to connect to an app instance then gorouter with attempt to retry the request up to 3 times by default. However, if there are only 2 app instances, then it will only attempt 2 times. If there is only 1 app instance, then it will only attempt once.
Routing release version 0.336.0 contains a fix improve the gorouter retry logic. For details:
https://github.com/cloudfoundry/routing-release/releases/tag/v0.336.0
This issue should only happen rarely for stable apps that are not crashing often.
If you are seeing this issue a lot and it's causing problems, then increasing the app instances to 3 or more will greatly reduce the chance of hitting this issue.
Upgrade to TAS/TPCF version 6.x and later contains a fix on this issue