You have an app with more than 3 instances and noticed on cf events that this app is crashing with error:
apps.domain.com - [2025-12-08T10:59:30.339504648Z] "GET /v1/config HTTP/1.1" 503 0 24 "-" "#####/16/9.17.0/android" "10.224.#.###:####" "10.224.#.###:####" x_forwarded_for:"34.###.##.##, 10.###.#.####" x_forwarded_proto:"https" vcap_request_id:"2ca641f5-##########" response_time:0.004578 gorouter_time:0.000175 app_id:"1bc0ca4b-#########" app_index:"8" instance_id:"1241145e-#######" x_cf_routererror:"endpoint_failure (tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match 1241145e-#####)" x_b3_traceid:"########" x_b3_spanid:"#########" x_b3_parentspanid:"-" b3:"#########"
Please note on this scenario;
To get more info we checked cf events <app>
❯ cf events <app-name>
Getting events for app <app-name> in org <org-name> / space prod as <user>...
time event actor description
2025-12-11T00:30:44.00+1100 app.crash <app-name> index: 0, reason: CRASHED, cell_id: ed3d92ab-#####, instance: 844b8fe4-######, exit_description: APP/PROC/WEB: Exited with status 1
2025-12-11T00:30:44.00+1100 audit.app.process.crash web index: 0, reason: CRASHED, cell_id: ed3d92ab-#####, instance: 844b8fe4-#####, exit_description: APP/PROC/WEB: Exited with status 1
2025-12-11T00:20:43.00+1100 audit.app.process.crash web index: 1, reason: CRASHED, cell_id: b8ef363e-#####, instance: 9bf8ca4e-#####, exit_description: APP/PROC/WEB: Exited with status 1
2025-12-11T00:20:43.00+1100 app.crash <app-name> index: 1, reason: CRASHED, cell_id: b8ef363e-#####, instance: 9bf8ca4e-#####, exit_description: APP/PROC/WEB: Exited with status 1
...
First let us look into one of the cell that crashed
diego_cell/ed3d92ab-###### let us check rep logs filtering instance id 844b8fe4-###### it will show nothing unusual as it will indicate normal container exit
rep/rep.stdout.log.1:{"timestamp":"2025-12-10T13:30:44.852645339Z","level":"info","source":"rep","message":"rep.executing-container-operation.ordinary-lrp-processor.process-reserved-container.run-container.containerstore-run.node-run.proxy.run-step.process-exit","data":{"cancelled":false,"container-guid":"844b8fe4-#####","container-state":"reserved","exitStatus":137,"guid":"844b8fe4-#####","lrp-instance-key":{"instance_guid":"844b8fe4-#####","cell_id":"ed3d92ab-#####"},"lrp-key":{"process_guid":"23d5cad5-######","index":0,"domain":"cf-apps"},"process":"844b8fe4-#####-envoy","session":"28353.#.#.#.#.#.#.#"}}
...
rep/rep.stdout.log.1:{"timestamp":"2025-12-10T13:30:44.856970787Z","level":"error","source":"rep","message":"rep.executing-container-operation.ordinary-lrp-processor.process-reserved-container.run-container.containerstore-run.node-run.proxy.run-step.failed-to-get-info","data":{"container-guid":"844b8fe4-#####","container-state":"reserved","error":"task for container 844b8fe4-##### not found","guid":"844b8fe4-#####3","lrp-instance-key":{"instance_guid":"844b8fe4-#####","cell_id":"ed3d92ab-#####"},"lrp-key":{"process_guid":"23d5cad5-#####","index":0,"domain":"cf-apps"},"process":"844b8fe4-#####-envoy","session":"28353.#.#.#.#.#.#.#"}}
...
rep/rep.stdout.log.1:{"timestamp":"2025-12-10T13:30:44.857091699Z","level":"error","source":"rep","message":"rep.executing-container-operation.ordinary-lrp-processor.process-reserved-container.run-container.containerstore-run.node-run.proxy.run-step.run-step-failed-with-nonzero-status-code","data":{"container-guid":"844b8fe4-#####,"container-state":"reserved","error":"Exit status 137","guid":"844b8fe4-#####","lrp-instance-key":{"instance_guid":"844b8fe4-#####","cell_id":"ed3d92ab-#####"},"lrp-key":{"process_guid":"23d5cad5-#####","index":0,"domain":"cf-apps"},"process":"844b8fe4-#####-envoy","session":"28353.#.#.#.#.#.#.#","status-code":137}}
When TAS/TPCF detect an app exiting, it explicitly reconfigure envoy to set its certificate to be invalid with no matching names so the error "endpoint_failure (tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match <guid>" is a known behaviour
This is to prevent any further requests from attempting to go to the failed instance, while the nats message for route de-registration gets sent and processed.
This message does not indicate any problems with the platform, but indicates only that the app instance is terminating.
This is likely caused by issue on your application. This could be due to a crash, a restart, a restage, a push, a stop, or even a redeploy of TAS that drains AIs off of cells.
Please check you application log and troubleshoot why app has crashed.
If logs does not provide more info. Turning on debug logs which might help what is the app's behaviour prior to the crash.
If you are using a TAS version with routing-release v0.336.0 and older and are seeing exactly the same error, then this is due to a known issue discussed on KB below:
https://knowledge.broadcom.com/external/article/386660/applications-return-http-503-certificate.html
Please note the KB above the issue can only be reproducible if you have application service instance of 2 and below.