Application crashes with log errors certificate is not valid for any names, but wanted to match <GUID>

Products

VMware Tanzu Platform Core

Issue/Introduction

You have an app with more than 3 instances and noticed on cf events that this app is crashing with error:

apps.domain.com - [2025-12-08T10:59:30.339504648Z] "GET /v1/config HTTP/1.1" 503 0 24 "-" "#####/16/9.17.0/android" "10.224.#.###:####" "10.224.#.###:####" x_forwarded_for:"34.###.##.##, 10.###.#.####" x_forwarded_proto:"https" vcap_request_id:"2ca641f5-##########" response_time:0.004578 gorouter_time:0.000175 app_id:"1bc0ca4b-#########" app_index:"8" instance_id:"1241145e-#######" x_cf_routererror:"endpoint_failure (tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match 1241145e-#####)" x_b3_traceid:"########" x_b3_spanid:"#########" x_b3_parentspanid:"-" b3:"#########"

Please note on this scenario;

there were no expiring certificates
there were no warning message that a certificate will be expiring
we have an application with Application Instances of more than 3 (8 on this case)

To get more info we checked cf events <app>

❯ cf events <app-name>

Getting events for app <app-name> in org <org-name> / space prod as <user>...
time                          event                     actor                           description
2025-12-11T00:30:44.00+1100   app.crash                 <app-name>   index: 0, reason: CRASHED, cell_id: ed3d92ab-#####, instance: 844b8fe4-######, exit_description: APP/PROC/WEB: Exited with status 1
2025-12-11T00:30:44.00+1100   audit.app.process.crash   web                             index: 0, reason: CRASHED, cell_id: ed3d92ab-#####, instance: 844b8fe4-#####, exit_description: APP/PROC/WEB: Exited with status 1
2025-12-11T00:20:43.00+1100   audit.app.process.crash   web                             index: 1, reason: CRASHED, cell_id: b8ef363e-#####, instance: 9bf8ca4e-#####, exit_description: APP/PROC/WEB: Exited with status 1
2025-12-11T00:20:43.00+1100   app.crash                 <app-name>   index: 1, reason: CRASHED, cell_id: b8ef363e-#####, instance: 9bf8ca4e-#####, exit_description: APP/PROC/WEB: Exited with status 1

...

First let us look into one of the cell that crashed

diego_cell/ed3d92ab-###### let us check rep logs filtering instance id 844b8fe4-###### it will show nothing unusual as it will indicate normal container exit

rep/rep.stdout.log.1:{"timestamp":"2025-12-10T13:30:44.852645339Z","level":"info","source":"rep","message":"rep.executing-container-operation.ordinary-lrp-processor.process-reserved-container.run-container.containerstore-run.node-run.proxy.run-step.process-exit","data":{"cancelled":false,"container-guid":"844b8fe4-#####","container-state":"reserved","exitStatus":137,"guid":"844b8fe4-#####","lrp-instance-key":{"instance_guid":"844b8fe4-#####","cell_id":"ed3d92ab-#####"},"lrp-key":{"process_guid":"23d5cad5-######","index":0,"domain":"cf-apps"},"process":"844b8fe4-#####-envoy","session":"28353.#.#.#.#.#.#.#"}}
...
rep/rep.stdout.log.1:{"timestamp":"2025-12-10T13:30:44.856970787Z","level":"error","source":"rep","message":"rep.executing-container-operation.ordinary-lrp-processor.process-reserved-container.run-container.containerstore-run.node-run.proxy.run-step.failed-to-get-info","data":{"container-guid":"844b8fe4-#####","container-state":"reserved","error":"task for container 844b8fe4-##### not found","guid":"844b8fe4-#####3","lrp-instance-key":{"instance_guid":"844b8fe4-#####","cell_id":"ed3d92ab-#####"},"lrp-key":{"process_guid":"23d5cad5-#####","index":0,"domain":"cf-apps"},"process":"844b8fe4-#####-envoy","session":"28353.#.#.#.#.#.#.#"}}
...
rep/rep.stdout.log.1:{"timestamp":"2025-12-10T13:30:44.857091699Z","level":"error","source":"rep","message":"rep.executing-container-operation.ordinary-lrp-processor.process-reserved-container.run-container.containerstore-run.node-run.proxy.run-step.run-step-failed-with-nonzero-status-code","data":{"container-guid":"844b8fe4-#####,"container-state":"reserved","error":"Exit status 137","guid":"844b8fe4-#####","lrp-instance-key":{"instance_guid":"844b8fe4-#####","cell_id":"ed3d92ab-#####"},"lrp-key":{"process_guid":"23d5cad5-#####","index":0,"domain":"cf-apps"},"process":"844b8fe4-#####-envoy","session":"28353.#.#.#.#.#.#.#","status-code":137}}

Environment

TAS 6.x onwards or Any TAS version with routing-release not containing routing release v0.336.0 and below
Your application needs to have more than 3 application instances

Cause

When TAS/TPCF detect an app exiting, it explicitly reconfigure envoy to set its certificate to be invalid with no matching names so the error "endpoint_failure (tls: failed to verify certificate: x509: certificate is not valid for any names, but wanted to match <guid>" is a known behaviour

This is to prevent any further requests from attempting to go to the failed instance, while the nats message for route de-registration gets sent and processed.

This message does not indicate any problems with the platform, but indicates only that the app instance is terminating.

Resolution

This is likely caused by issue on your application. This could be due to a crash, a restart, a restage, a push, a stop, or even a redeploy of TAS that drains AIs off of cells.

Please check you application log and troubleshoot why app has crashed.

If logs does not provide more info. Turning on debug logs which might help what is the app's behaviour prior to the crash.

Additional Information

If you are using a TAS version with routing-release v0.336.0 and older and are seeing exactly the same error, then this is due to a known issue discussed on KB below:

https://knowledge.broadcom.com/external/article/386660/applications-return-http-503-certificate.html

Please note the KB above the issue can only be reproducible if you have application service instance of 2 and below.