Receiving ASM failures with codes 9260 and 9235

Products

CA App Synthetic Monitor

Issue/Introduction

We see several monitors failing with Codes 9260 and 9235.

Per the Error Messages link, these are described as "Curl Problem with the CA cert (path? permission?)" and "Curl SSL connect error. The SSL handshaking failed."

A similar issue was reported via 32873231 in October 2021. We were receiving error code 9235 then. The issue was with specific POP Stations and it was fixed.

On 4th July 2022, there was an issue identified and a maintenance performed by ASM team to fix Certificate related errors https://status.broadcom.com/notices/hvrzytjsnr6lubj8-app-synthetic-monitor-certificate-errors

We are still continuing to get these errors and tickets are being created for these.

Environment

Release : SAAS

Component : CA APP SYNTHETIC MONITOR (WATCHMOUSE)

Resolution

We do not support TLS 1.3 at this time. By selecting TLS 1.2, you are limiting encryption choices.

There was a new ASM release. This upgrades curl for the first time in many years. By default, curl selects the encryption. This issue should go away at that time or we will suggest an api change to set on all unchanged monitors .

Additional Information

There are several independent problems/issues mixed together.

1) the problem on the stations with station certificate: our stations are using self signed certificate and this certificate expired. Fortunately, the certificates did not expire on all the stations at the same time so the monitors were redirected to other stations in the pool and customers may not have noticed such problem. It expired approx. on Jul 2nd - 4th and in the logs you'll see an internal error (check rejected by the station, customer's server was not even contacted).

2) On the other hand, the results from July 28 and July 30 in the screenshot above are normal errors, i.e. the check was accepted by the station (the problem with the station certificate was already resolved) and the handshake failed when connecting to the customer server. As you can see from the error message the server certificate was invalid. I've verified the current server certificate and it was generated recently (July 25th).4

I'm currently struggling to find some explanation how this can happen. The only way I can imagine is some load balancer and one of the stations has an invalid certificate:

1) all but one monitors you reported are monitoring the same server/IP
2) the problem occurs max once per day and from different stations. Charleston, CD and Charleston, CI are different monitoring stations (CI, CD) in the same location (Charleston). Thus, it is not always failing on the station.

3) Even the same station says within a few minutes that it works / fails / works:

We also faced a similar issue (after the migration - with the tunnel clients) and found out that it was the cluster which was incorrectly configured. We were sharing the cluster with other teams / applications and sometimes the load balancer selected a pod which was not ASM. There was another web server (and certificate) running and the connection failed. Ask them to verify this scenario too.