The cloud proxies in our DR environment in the DMZ network zone are logging frequent "connection reset by peer", about 3500 an hour. Agent are not able to report metrics at times and this issue seems to be affect APMIA performance.
Infrastructure involved in our deployment: APM agents/ACC Controllers > load balancer > cloud proxy > DX SaaS
Cloud proxy release: 184.108.40.206
Release : 11.1.0
Component : APM Agents
A load balancer can be put in front of APM Cloud Proxies to achieve
both load-balancing and HA, however, the following requirements have
to be met to support different protocols.
1/ HTTP Protocol - Cloud Proxy is handling HTTP session for each
agent. Each session is bound to a WebSocket channel taking care of
transporting actual Isengard payload to Cloud Gateway. Therefore load
balancer has to be configured in a way that HTTP requests from an
agent are always routed to the same Cloud Proxy. L7 load balancer
usually implements this through a sticky session routing. L3 load
balancing my leverage source based routing.
2/ WebSocket protocol - If Agent web socket connections are to be
supported, then the load balancer has to be configured in a way to
correctly pass the initial upgrade request to CloudProxy.
3/ Isengard protocol - Isengard protocol would be balanced and routed
through L3 load balancer.
Configuring a load balancer for HA might require a health-probe. The
Cloud Proxy does not provide a specific endpoint for this now.
Customer can configure a health probe against the HTTP (8081) or HTTPS
(8444) endpoint checking against /supportability/health, that will be
introduced in a future version for now just ignoring the response