How would we be able to get solid evidence that we have an overloaded APM cluster / cloud proxy?
CloudProxy is a standard Java application which depends on the resources, like Memory, CPU, Network.
The memory is the most important. CloudProxy has to hold info about each connected agent, buffers for the transfer data, ...
1) investigate memory metrics for last 24 hours with "Min/Max Display"
a) "Resources|Memory:Memory Heap Used". There should be enough spare memory comparing with "Resources|Memory:Memory Heap Max". I would prefer 2GB space to process a wave of increased traffic. "Memory Heap Max" can be adjusted by the env. variable APM_HEAP_XMX (in MB).
Of course, there should be enough memory on the host "Resources|Host|Memory:Memory Total (byte)".
b) No significant spikes for "Resources|Memory:GC Time (ms)" above e.g. 500ms
2) investigate CPU
"Resources|CPU:CPU Used (%)" - should not use all core on the host
"Resources|Host|CPU:Idle CPU (%)" - there should be a space of CPU usage on the host
3) for network
"Throughput:Http Read Bytes per Interval" - for http/https/ws/wss agent connections
"Throughput:Isengard Read Bytes per Interval" - for isengard (usually not used by customers)
When to add another Cloud Proxy:
- Needed for failover
- Not enough system/host resources
- Having more than 10000 agents