How would we be able to get solid evidence that we have an unhealthy APM cluster / cloud proxy?
Since DevOps oversees the APM Cluster, they are already monitoring SAAS Health. The Supportability metrics has various ways to measure cluster health
See Cluster Supportability Metrics
There are Health,Connection, and various Cluster metrics
And there are Cloud Proxy health metrics that can help.
Cloud Proxy Suportability Metrics
Various connection metrics are included.